extreme Winkelabhängigkeit beim AF, sondern auch offenbare Schwächen bei der Bestimmung des MIP-Map-LODs: Unseres Wissens nach führten ATIs LOD-Berechnungen bei aktiviertem AF schon mal zu Textur-
Bei der Entwicklung eines Grafikchips muss aus verschiedenen Gründen auf den Transistor-Count geachtet werden. Das heißt, dass Prioritäten gesetzt werden, welche Features besonders wichtig sind und welche weniger. Zu jedem gegebenen Zeitpunkt ist die Komplexität eines Chips begrenzt, so dass die Entwickler immer einen Kompromiss zwischen Leistung und Features (im weiteren Sinne auch Bildqualität) schließen müssen.
Bei ATI hat es nun etwas Tradition, dass die Textur-Filterung zurückstecken muss. Beispiel R200 (Radeon 8500): Dem im zeitlichen Kontext gesehen sehr fortschrittlichem Pixel Shader und Eigenschaften wie Overbright Lighting (und vermutlich auch eine erhöhte interne Farbpräzision) stehen Nachteile bei der Texturfilterung, insbesondere bei anisotroper Filterung ("AF") gegenüber.
Damit meinen wir nicht nur die Limitierung auf bilineares Filtering ("BF") im Zusammenspiel mit AF und die extreme Winkelabhängigkeit beim AF, sondern auch offenbare Schwächen bei der Bestimmung des MIP-Map-LODs: Unseres Wissens nach führte ATIs LOD-Berechnungen bei aktiviertem AF schon mal zu Textur-Aliasing. Allerdings haben wir dies seinerzeit nie tiefgehender untersucht, so dass wir darauf nicht groß herumreiten wollen. Heute aber wollen wir uns die Zeit nehmen, um uns in diesen Filter-Fragen aktueller Hardware zu widmen: Der R3x0-Familie.
ATI sah bei dieser endlich ein, dass trilineares AF sinnvoll ist, und hat außerdem die Winkelabhängigkeit reduziert. Trotzdem: Stellt man 16x AF ein, werden bestimmte Winkel nur mit 2x AF gefiltert. Obwohl es bei ATI nichts neues ist, dass der eingestellte AF-Grad nur partiell zum tragen kommt, möchten wir an dieser Stelle darauf noch einmal ausdrücklich hinweisen: Aktiviertes 16x AF führt bei einigen Winkeln nicht zu 16x, sondern nur zu 2x AF. Kann man angesichts dieser Tatsache das AF vom R300 überhaupt mit Fug und Recht "Anisotropes Filtering" nennen?
Ja, das kann man. An-isotrop heißt zunächst nur, nicht isotrop. ATIs R300-AF filtert in der Tat (jeden Winkel) nicht isotrop. Gibt es eine genauere Definition? Wir konnten keine "gültige", also verbindliche Definition für AF finden, aber: Es ist kein Geheimnis, welche Qualität "Lehrbuch-AF" haben muss. Mit den Interna von anisotroper Filterung beschäftigen wir uns in einem zukünftigen Artikel noch ausführlich, an dieser Stelle wollen wir nur darauf zu sprechen kommen, wie ein "perfektes" AF auszusehen hat. Nämlich genau wie trilineare Filterung (TF), jedoch pro Verdoppelung des AF-Grades müssen die MIP-Level um eine Stufe "nach hinten" geschoben sein. Reden wir nicht lange um den heißen Brei, "perfektes AF" wird praktisch auf keiner derzeitigen Hardware geboten.
Löbliche Ausnahme ist 2x AF auf GeForce-Karten, doch schon 4x AF ist (bei 45°-Winkeln) leicht, 8x AF (ebenfalls bei 45°-Winkeln) deutlich "optimiert". Kyro-Karten können nur 2x AF, beherrschen diesen Modus jedoch in Perfektion. Die Formel für optimales AF ist so komplex, dass dort gerne gespart wird und man Abweichungen von der Perfektion in Kauf nimmt.
Transistoren zu sparen, war wohl auch der Hintergedanke bei ATI. Die exakte Implementierung ist natürlich nicht öffentlich zugänglich, doch ist es wahrscheinlich, dass ATI sich um die (Transistor-intensive) Implementierung der Wurzelfunktion beim AF drückte. Lässt man jene weg, gelangt man zunächst zu einer Lösung, nur volle 90°-Winkel anisotrop filtern zu können (was die Winkelabhängigkeit bei R100 und R200 erklären dürfte). Mit einer Erweiterung, welche vergleichsweise wenig Transistoren beansprucht, kann man weiterhin die Wurzel-Berechnung sparen, gewinnt jedoch zusätzlich volle 45°-Winkel hinzu.
Dass dies der Fall sein könnte, wird von uns auf Basis einer Überlegung seitens Demirug derzeit vermutet. Unabhängig von diesen Theorien lassen sich jedoch zwei Dinge sicher sagen:
* ATI spart Transistoren bei der Berechnung der AF-Sample-Koordinaten.
* Da der eingestellte AF-Grad über den Vollkreis verteilt nur partiell appliziert wird, spart man auch noch Füllrate.
For several reasons, the development of graphics chips is constrained by transistor budgets. Features are subject to priorities, some are important, some are less important. Chip complexity has been and always will be limited so that developers need to strike a balance between performance and functionality - which includes image quality.
ATI have traditionally cut corners in the texture filtering department. Take, for instance, R200 (Radeon 8500): on the one hand quite sophisticated - for its time - pixel shading, support for overbright lighting (and apparently increased internal color precision), on the other hand there are disadvantages when it comes to texture filtering, in particular when using anisotropic filtering ("AF").
We do not only refer to the AF being bilinear only, or its extreme dependence on surface angle, but also to obvious flaws with the determination of mip map lod: as far as we know ATI's lod calculations can sometimes lead to texture aliasing with AF enabled. As we have never done in-depth research on this we don't want to press this point. Instead, we'd like to focus on filtering issues with more current hardware: the R3x0 family.
ATI finally realized that trilinear AF does make, and they also reduced the dependency on surface angle. Yet there are still certain angles that get only a 2x AF treatment with 16x AF selected. Although it is old news with ATI, we'd like to point out - again - that activating 16x AF will yield not 16x AF but only 2x AF for certain angles. Does this truly deserve the name "anisotropic filtering"?
Yes, it does. Anisotropic means just that: not isotropic. The R300's AF isn't isotropic (at any surface angle). When looking for a more precise, binding definition of AF we came up empty, but: the quality to be expected from "textbook AF" is no secret. We will deal with the inner workings of anisotropic filtering in a future article. At this point, we just want to state how a "perfect AF" needs to look. It must look exactly like trilinear filtering (TF) but with the mip maps pushed "away" by one level for each doubling of the degree of anisotropy. But enough rambling, "perfect" AF is virtually not available on any current hardware.
The one laudable exception is 2x AF on the GeForce series, but 4x AF is already sligtly "optimized" (at 45° angles), which becomes more apparent at 8x AF (also at 45° angles). Kyro graphics cards are limited to 2x AF, but handle this mode with perfection. The formula for optimal AF is so complex that it is common to tweak it, deviations from "pure AF" are accepted for the savings in transistor budget.
Transistoren zu sparen, war wohl auch der Hintergedanke bei ATI. Die exakte Implementierung ist natürlich nicht öffentlich zugänglich, doch ist es wahrscheinlich, dass ATI sich um die (Transistor-intensive) Implementierung der Wurzelfunktion beim AF drückte. Lässt man jene weg, gelangt man zunächst zu einer Lösung, nur volle 90°-Winkel anisotrop filtern zu können (was die Winkelabhängigkeit bei R100 und R200 erklären dürfte). Mit einer Erweiterung, welche vergleichsweise wenig Transistoren beansprucht, kann man weiterhin die Wurzel-Berechnung sparen, gewinnt jedoch zusätzlich volle 45°-Winkel hinzu.
Saving transistors has apparently been ATI's goal, too. Implementation details are not available to the public, of course, but it is likely that ATI shied away from implementing the transistor consuming square root function required for AF. If you leave that out, the resulting circuit can only apply anisotropy at angles that are multiples of 90° (which handily explains the angle-dependency of R100 and R200). With a rather simple extension, still not implementing the square root circuitry, multiples of 45° become doable.
We arrived at this theory based on some of Demirug's ideas. Leaving theory aside for a moment, two things are for sure:
*ATI reduced the transistor count of the AF sampling point calculation
*Fillrate is saved because the full degree of anisotropy is only partially applied over the full circle
Seite 1, ~zweite Hälfte (huha! huha! huha!)
This makes it is easy to assert ATI has the faster AF. For a long time, AF was more or less just a checklist feature with questionable implementation. GeForce series graphics cards were about the first which delivered "reasonable" AF but were limited to 2x. The GeForce 3 could do up to 8x AF which, however, wasn't exposed by marketing. Presumably it wasn't deemed beneficial to recommend this "make everything slow" feature. Properly investigating this feature early on, and judging it by quality instead of frame rate, like the majority at the time, was an achievement of 3DConcept founder Raphael auf der Maur (ram).
ATI's original Radeon (R100) hit the market earlier than the GeForce3 and could do 16x AF – albeit with limitations so severe that we don't think it's a useful "AF solution". The GeForce4 Ti's (NV25/NV28) bilinear AF performance is hampered, apparently because of transistor count considerations. And the GeForce FX (NV3x) can exchange AF quality for performance.
That isn't exactly our idea of progress. When balancing fill rate against quality, "textbook AF" is simply the optimum. Since NV20, NVIDIA implements a pretty nice calculation of AF sampling points anyway, so it's a real pity that "8x AF" is offered at performance levels of 4x AF but with overall quality that's worse than that. It's fair enough to call this mode "8x AF" because some pixels are treated this way, but, of course, we'd rather utilize a given mode to its fullest before activating a higher one. Maybe NVIDIA merely reacted to the pressure ATI put them under and wanted to show off high frame rates rather than making any given AF levels work as good as possible (as it should actually be expected out of a quality feature).
Before we jump in, we'd like to clear up a possible source of confusion up front: we don't take issue with ATI's 16x AF not filtering everything at 16x. 1x, 2x, 4x, 8x, and finally 16x filtering is selected dynamically, as required (NVIDIA's hardware performs the same dynamic selection). The selection process depends upon the angle of incidence of the surface in question. If the texture is distorted at a 1:2 ratio, 2x AF is enough. There aren't any tangible quality gains to be had beyond that. What's different at ATI is the extreme dependency on surface angle. A weaker form of this dependency can be observed on NVIDIA hardware, too, starting at 4x AF.
Unfortunately ATI went a bit further than the competition did with respect to texture filtering logic simplifications and, as a consequence, deviations from textbook quality.
Base filter simplifications
A trilinear filter is a linear interpolation between two bilinear samples, requiring a weight between 0 and 1. ATI allocate five bits for this weight, which matches Direct3D's reference rasterizer (however, higher precision is allowed by Direct3D and in fact desirable). In OpenGL, SGI - who spearheaded the inception of this API - use eight bits. That's also the standard that's followed by, eg, Nvidia's GeForce range that implements the 8 bit linear interpolation weight for both OpenGL and Direct3D.
These three additional bits result in an eightfold increase in definition. Do we "need" that? In our opinion, at least six bits of "LOD fraction" are desirable to minimize banding artifacts. Five bits are okay for most cases, while four bits are definitely too few. Eight bits may be slightly overkill but then there's no disadvantage to precise texture filters. Anyway, textbook quality is eight bits, this guarantees zero banding and also constitutes SGI's recommendation for OpenGL.
"5 bit LOD fraction" issues are hard to demonstrate using screen shots, significantly harder than pointing out the issues with Nvidia's "brilinear" filtering. We still can't help wondering why it would be necessary to make savings in this area, while at the same time there's pixel shading hardware operating on 96 bits (4x 24 bits floating point), and 6x sparse, gamma corrected multisampling anti-aliasing. ATI apparently went for maximum savings in texture filtering logic. Even bilinear filtering fell victim to these "optimizations".
We're not quite sure what causes these block artifacts. But we do know how a bilinear filter should look, and that the competition does offer texbook quality - as would be expected from any current graphics card. While we're at it, this "optimization" has been done since R200 at the latest.
Creating the bilinear sample requires knowledge of the exact sampling position for the filter kernel. Similar to the borderline acceptable lod fraction precision, ATI appear to have made some savings in this area. The sample coordinates' fraction bits are used to calculate a weighting matrix for the source texels. Maybe this calculation was subject to complexity reductions, involving lookups to skip some arithmetic.
These artifacts won't be seen with tiny textures, we can only speculate at this time about the reasons. We don't know where and how savings were made, we only know that they have been made. In short: current Radeon based cards can't deliver textbook quality bilinear filtering in certain circumstances. GeForce base cards don't have these issues.
This also applies to the weight matrix's quantization. While GeForce chips implement eight bits, ATI chips have to make do with six. More than eight bits wouldn't make much sense because the textures' and framebuffer's color channels are only eight bits wide. However, if even a single bit less is used, ie seven bits, the full color range of the frame buffer can't be used anymore. There would still be more than 2^7=128 gradients in total, but that doesn't help with pixel lines that sit exactly between two texels. To wrap it up: less than 8 bits of resolution lead to block artifacts under heavy zoom.
This, too, is very hard to prove with "realistic" (ie in-game) screen shots. As long as there are higher resolution mip map levels to use, the bilinear filter will at most zoom a mip level to double size, obscuring these artifacts. To reveal them, a high contrast, high frequency texture must be heavily magnified.
So these issues are hard to spot, doesn't that make it alright? We don't think it's okay to go below textbook quality on bilinear filters. These are the foundation for eg function lookups used in pixel shading. The better the bilinear filter, the better the end result - pretty obvious. The most simple of all texture filters is bilinear. We'd expect it to be implemented without compromise.
Unfortunately ATI set their priorities a bit differently, as demonstrated by mip map LOD calculation:
<Bild #3>
Seite 3, letzter Teil (...)
The GeForce card exhibits imperfections the size of a quad in this example (a quad is a 2x2 pixel block, and lod calculations are performed per quad, not per pixel). On the other hand the Radeon card produces a chaotic pattern, with wildly varying LOD. Apparently the LOD calculation was implemented with as few transistors as possible, sacrificing accuracy. GeForce cards aren't outright perfect either, we can produce situations where they, too, show "dithering" patterns. Still, the Radeon cards start doing it much earlier.
<Seite 2 zu Ende>
Save to the max
ATI hardware is very carefully designed. The cut corners in texture filtering we've been criticizing are hardly noticed in practice. It seem's to have been a basic design philosophy for R300 to offer only the precision that's needed. This is how we'd like to prove it:
ATI's pixel shading hardware is running at 24 bits of floating point (FP) precision, formally "s16e7" (ie sign, 16 bits for the normalized mantissa, 7 bits for a two-based exponent - we defer a discussion about what exactly that means to a later point in time). For purely color operations FP24 is in fact already "too much". Let's investigate.
Textures usually yield 8 bits of integer precision per color channel. That's exactly frame buffer precision. It's still desirable to use higher levels of precision for shader calculations, because they can involve many operations.
Nvidia's 16 bit floating point format is sufficient for a lot of cases but not for all: floating point is not per se better than fixed point, there are cases where a 16 bit fixed point format (FX16) is better than FP16. Thus NVidia will presumably offer FX16 on NV40. FP24's precision however is always as good or higher than the CineFX proprietary FX12, FX16 and of course FP16.
Seite 3, Teil zwö :karate:
So far, so good. Apart from color operations, there are also operations that modify sampling points, eg matrix multiplications (2D transformations) or dependent reads (for eg environment mapped bump mapping). This is the realm of texture coordinates, modified inside the shader. Texture coords usually lie in the [0...1] range, values outside of that range are however just as valid.
With a 16 bit mantissaw we get 2^16=65536 distinct values in the [0.5...1] range. Sounds like a lot of precision. With large textures, eg 2048x2048, that leaves us with 64 steps between two texels, ie 6 bits of "fraction". Isn't that enough?
It is - for isotropic filters. When using AF, this coordinate is a starting point for calculating more sampling positions. These calculations are most likely performed with FP32. We already learned that the bilinear filter (the fundamental building block for both trilinear and anisotropic filters) has been heavily "optimized", it's not as precise as it should be.
Because of AF's adaptive nature, many samples are generated from smaller resolution mip map levels, so precision issues are somewhat reduced. FP24 is adequate for coordinate calculations with the level of effects we expect from pixel shader 2.0, even though some pixel shader 1.1 texture operations are already performed with FP32 on the GeForce 3.
Unless there's heavy magnification involved, the block artifacts in ATI's bilinear filtering only show up as slight color skew and is hardly visible. AF still requires exact positioning of AF samples. As far as we know, the FP32 precision that's used for texture lookups doesn't quite produce the same quality as competing products.
Seite 3, Teil dry :flower:
It all fits together: there's only FP24 for operations on texture coordinates, yet there's no apparent disadvantage because of the simplifications in BF, TF and AF. This is how R300 offers unmatched performance, but doesn't deliver the best image quality. From an "ethics" point of view (whatever that means to ATI and NVidia) the competition can easily reduce image quality through drivers, to squeeze a bit of extra performance out of the chips and keep up.
Of course, hardware solutions must work in practice, and optimal texture filtering implementations come at a premium in transistors. It's not per se wrong to investigate this aspect for possible savings. As long as side effects are negligible. Even though ATI's isotropic filtering is mostly sufficient for gaming, we criticize their going below textbook quality (ie SGI's 8 bit). Isotropic filtering close to perfection is something we expect from any gaming hardware.
ATI, however, are agressive when it comes to savings: only 5 bits of LOD fraction, only 6 bits of resolution for filter weights. On top of that a rather peculiar LOD determination scheme that seems to lack a lot of precision in some situations.
When it comes to AF we're also skeptical of the extreme angle dependency: it's something we're used to with ATI, but this doesn't make it a valid solution going forward to have great pixel shading on the one hand, and on the other hand to be restricted to performance optimized filtering schemes with definite drawbacks in quality.
Seite 3, Tile phier :freak:
R300's texture filtering logic is an improvement upon R200's: BF block artifacts are still there, AF was an improvement over R200 but still won't match the competition's quality. The presumed LOD bug in R200's AF makes a comeback in R300, albeit in a slightly different form. With LOD biases greater than zero there's LOD clamping instead of biasing (ie the highest resolution mip map level is never used alone, which reduces maximum detail).
We believe it's a hardware issue, even though it can likely be worked around with drivers. Even though the LOD bias should always be exactly zero: R300's texture filtering does leave a lot of room for improvement, both for isotropic and anisotropic modes.
This is somewhat disappointing: whatever area of R300 we were poking at, we've always found something that could have been done better, as demonstrated by the competition's textbook quality. It's likely that there are even more filtering simplifications on the Radeon, that we simply haven't found yet.
We'll make filtering quality an important part of our testing methods for future hardware, regardless of vendor. Frame rate is only one partt of the equation, a fair benchmark should always consider image quality as well. Texture filtering is the foundation of all 3D realtime rendering, regardless of whether you use a multitexture pipeline or pixel shaders. Optimum filtering quality isn't an added bonus, it's a requirement.
Thanks go out to Demirug, whose ideas were invaluable to our investigation on filter quality, to Xmas for hints, corrections and screen shots, to Argo Zero and Axel for screen shots, and also to Ailuros, who unintentionally got the ball rolling :-)
