3dfx Voodoo5 6000 Review [Archiv]

Archiv verlassen und diese Seite im Standarddesign anzeigen : 3dfx Voodoo5 6000 Review

Mr. Lolman

2007-04-24, 17:30:35

aktueller Übersetzungsstatus von Graham Penny (Freelance Translator aus dem 3dfxzone Forum):
[Page 1]

15 December 2000 signalled the end of an era: on that date over six years ago, 3dfx, the pioneers of filtered polygon acceleration on PCs, closed its doors. nVidia, their main rival, bought them out for 112 million dollars - patents, technologies, the lot. It was a black day for the 3dfx community, which until then had been considerable in size. One question remained, however: could they have signed off with a flourish?

This article is dedicated to answering precisely that question by looking at the fastest of the Voodoo 5 graphic cards, an artifact of days long past, namely the Voodoo5 6000 AGP, the masterpiece that never officially made it out of 3dfx's doors.

Background

Ah, but we remember it well: the graphics card made quite an impact when it was first presented to the public at Comdex in early 1999, astonishing everyone with its four graphics processors. What we didn't know then was that the card would never make it to retail.

The model that was displayed then was based on the "original design", which still used a 2x2 chip arrangement - and was nowhere near functioning. The development process revealed major issues with the card layout, and it was eventually discarded in favour of the four-in-a-row layout that has since become famous. Even that caused problems right up to the end, primarily with instability under heavy load, a bug that is caused from the PCI subsystem and continues to plague all existing Voodoo5 6000 prototypes. It is primarily this bug that prevented the Voodoo5 6000 from appearing on store shelves.

Subsequent efforts to remedy the problem, such as underclocking newer revisions from the anticipated 183 MHz to 166 MHz were also in vain. Then the axe dropped, caused by the massive development costs and repeated disappointing quarterly results. On 13 November 2000 3dfx reported that it had transferred all of its SLI patents to its Quantum3D subsidisary (which still exists today) and had pulled out of the graphics card business. All the Voodoo5 6000 prototypes that had been made up to that point also went to Quantum3D. Well, almost all of them...

It is estimated that some 200 such prototypes exist, but not all of these are working cards - word on the Internet is that that figure is more like 100. Worldwide. Who it was that smuggled some of these out from under Quantum3D's noses will likely forever remain a mystery, and the number of V56k cards that have ceased to function in the meantime is similarly unknown. Most prototypes have found their way into the hands of genuine fans, but even today cards appear for sale on online auction sites from time to time.

All of these 6-year-old graphics cards have one thing in common: they cost a pretty penny. A defect card will set you back at least 500 euros, while a working Voodoo5 6000 clocked at 183 MHz can easily cost two or three times as much. However, that figure becomes much more palatable when considered against the background of the card's "population density" and the cost of current high-end graphics cards, as unlike the latest products to come off ATI and nVidia's respective production lines, the Voodoo5 6000 has maintained its value astonishingly well over the years - much to the disappointment of those of us with not quite so much cash to splash.

This is where we come in. It took us almost 6 years, but today we are able to proudly present what we were all so cruelly denied back then - a review of the Voodoo5 6000. Sure, we know we're not the first, as the odd test has appeared on the Net in the intervening period (not to mention in German print magazine PC Games Hardware last year), but we can guarantee that we have found the answers to all the questions about 3dfx's last functioning product - those questions that we asked ourselves when we read the other tests (and more besides).

The test cards

Two identical Voodoo5 6000 cards were used for this article, both of which were examples of the best known and most stable revision built, the "Final Revision 3700-A". The numbers indicate that they were produced in the 37th calendar week of 2000. Prototypes like these can almost be described as fully functional graphics cards - and that caveat is only because they, too, display the PCI bug described above. The kicker is that this problem was also solved, but unfortunately only after 3dfx had invested copious resources on it and, ultimately, collapsed.

Hank Semenec, the "Godfather" of the Voodoo5 6000, who now plies his trade at Quantum3D, came up with the so-called "PCI Rework" in his free time, which removed the instabilities. The bug fix manifests in two ways, one internal and one external, each of which was in evidence on one of our test cards. With the fix, both are fully useable and the revolutionary AA modes, which play a significant role in the tests to follow, are completely stable. We must also thank Hank Semenec for the repair that allowed one of our 2 Voodoo5 6000 cards to even function. Our thanks once again!

[The Voodoo5 6000 AGP Rev. 3700-A is exactly 31 centimetres in length. At one end you can see the upside-down HiNT-Bridge, to the left of the GPUs, together with the power supply soldered directly to the PCB. The original "Voodoo Volts" power supply concept is not really necessary with current PSUs (and is very rare anyhowe), but at the time it was the only way to make some computers Voodoo5 6000 compatible (Image: © PC Games Hardware).]

[On the rear can be seen masses of SMDs (Surface Mounted Devices) laid out in what was at that time a very high density (Image: © PC Games Hardware).]

[The external PCI rework on our test card. Note the "Not for Sale" sticker just next to it, evidence that the card was clearly a prototype (Image: © PC Games Hardware).]

Before delving into framerates and such like, we're first going to take a loook at the underlying technology, the resultant picture quality and, of course, the test settings.

[Page 2]

The VSA-100 chip

The last generation of cards from Messrs. 3dfx were still based on the original Voodoo Graphics design dating back to 1996. It's because of this, "Rampage", the true successor to Voodoo, never reached the public, despite several attempts to remarket it. In short, this was caused - as with so many of the negative aspects of 3dfx - by insufficient R&D resources.

Unfortunately, modern features such as environment and per-pixel lighting and hardware-accelerated transformations didn't find their way onto the VSA-100. In reality that was an irrelevancy for this generation of cards, as there were next to no games that used such features. 3dfx went in a different direction with VSA-100, implementing some of the ideas from Rampage that could be used in every game and noticeable improved picture quality - in particular the famous "T-Buffer" (the "T" was from the surname of Gary Tarolli, 3dfx's co-founder and chief designer). It enabled:

* scaleable SG-FSAA (sparse grid full-scene antialiasing - more about this later)
* depth of field (depth blurring, to increase realism in games)
* motionblur (a type of temporal antialiasing where the actual framerates are split in order to merge several frames into one. The amount of temporal data for the end image therefore increases for each frame used. It should not be confused with Motion Trail, which simply blurs between consecutive frames.)
* soft shadows (smooth graduation of shadows)
* soft reflections (smooth graduation of reflections)

The transistor budget for the VSA-100 was clearly invested in speed and scalability rather than on a checklist of features. Incidentally, "VSA" is an abbreviation of Voodoo Scalable Architecture, and an eponymous one at that: by scaling the number of chips on the graphics card it was possible to accommodate every segment of the graphics card market at once, resulting in massive savings on R&D resources, because only one GPU had to be developed and completed.

This was made possible by the use of scanline interleaving (SLI - not to be confused with nVidia's SLI). The frame to be rendered was split into lines, and each GPU was responsible for rendering its own section. More SLI'd GPUs meant less work for each of the VSA-100 chips to perform. The RAMDAC would then assemble the fully rendered "line combs" from the GPUs. This has the advantage over the now standard AFR/SFR approach that there is no need for CPU-intensive loadbalancing or the precalculation of several images, as all of the GPUs are working on the same one.

Unlike with AFR (Alternate Frame Rendering), the video memory is also involved in scaling, at least in part. Although there is redundant retention of textures due to the GPUs not having access to the texture memory of each of the other GPUs (which would also have been extremely difficult to achieve), the frame buffer required for each GPU actually ends up being smaller, because the image being rendered by each of them is smaller.

The bandwidth scaling is also excellent, with each VSA-100 having its own memory interface. On a Voodoo5 5500 this would have been two independent 128-bit SDR SDRAM interfaces. This should be considered as far more effective than a single 128-bit wide DDR SDRAM interface, because it means that there is less wasted bandwidth and there are fewer stalls in the graphics pipeline caused by memory reads. It's for precisely this reason that ATI and nVidia subsequently also later split their memory interfaces into several smaller ones.

All of these advantages mean that the SLI process gives an unrivaled level of efficiency when scaling. It was possible - and worthwhile - to link together up to 32 such GPUs, but this remained the preserve of the professional military simulators produced by 3dfx subsidiary Quantum3D. For the consumer market, if we include the Voodoo5 6000 this would have been limited to 4 GPUs per graphics card.

The reason this approach died a death following 3dfx's demise is simple: the hardware-accelerated transformation suffers from one major disadvantage. All of the GPUs in an SLI setup have to calculate the same geometry, meaning that the performance in terms of geometry is not scaled accordingly, although the TnL-less VSA-100 chip was only mildly affected by this (presumably the Rampage was intended to have an external geometry unit). In addition, the texture bandwidth is not scaled linearly in an SLI setup, since there was no way of determining which texture parts were required and which were not, only whether textures needed to be available in their entirety. Again, however, the VSA-100 apparently had enough bandwidth per texel to ensure that this did not became a more significant disadvantage.

In terms of the standards in 2000, the VSA-100 incorporated 32-bit framebuffer support and increased the maximum texture size from the antiquated 256x256 pixels to 2048x2048, making hi-res textures a reality on Voodoo cards. To implement this feature sensibly and prevent it from overloading the texture bandwidth 3dfx - like its competitors - used texture compression.

While the Voodoo3 supported NCC (narrow channel compression), this proprietary system was never fully used by games. The VSA-100, by contrast, also made use of S3 Texture Compression (S3TC), which was widely adopted. Because using S3TC in OpenGL games meant paying a licence fee to S3 Graphics, however, 3dfx developed a similar system called FXT. Metabyte's well-known OpenGL Glide wrapper "WickedGL" allows users to convert every texture in OpenGL games to the FXT format.

The VSA-100 was the first Voodoo chip to have 2 independent pixel pipelines, meaning it could process 2 different pixels per cycle, which was often more efficient than the Voodoo3's multi-texturing pipeline. The design was therefore widened, but shortened at the same time. Rival companies were already using quad pipelines to texture polygons, which sounds progressive, but in this instance that not the case:

Broadly speaking, a quad pipeline is a single pipeline repeated four times, which can calculate one pixel quad, i.e. a 2x2 pixel array (shown in blue on the diagram) per cycle. This approach saves considerably on the number of transistors needed for operating logic, but it also means that all of the pixel pipelines are limited to performing the same operation, which particularly with small triangles is not always optimal (as shown by the white pixel within the blue line on the diagram). The VSA-100, like all pre-GeForce renderers, does not suffer from this.

Although the Voodoo5 6000 carries the number "5", it in actual fact belongs to the fourth generation of Voodoo products. One can only assume that 3dfx wanted to differentiate clearly between the single chip variants (the Voodoo4 4500) and the multi-chip cards so as to emphasise their performance to buyers.

Voodoo4 4500 Voodoo5 5500 Voodoo5 6000
Pixel pipelines 2 4 8
TMUs per pipeline 1 1 1
Production process 220nm 220nm 220nm
Chip speed 166 MHz 166 MHz 183 MHz
Memory speed 166 MHz 166 MHz 183 MHz
antialiasing 2x SG-SSAA 4x SG-SSAA 8x SG-SSAA
Fill rate MTex/sec 333 666 1464
Bandwidth GB/sec 2.6 5.3 11.7
Maximum texture size 2048x2048 2048x2048 2048x2048
Graphics memory 32 MB 2x32 MB 4x32 MB

3dfx promoted the VSA-100 range using the slogan "Fill rate is King", but the delay meant they were unable to pit the range against its intended competition, nVidia's GeForce 256, a confrontation that it could have walked away from with little more than a ruffled collar. Sometimes things just don't work out as planned. Still, our Voodoo5 6000 cards are ample proof of this slogan: combined with the efficiency gains from SLI, the fill rate of 1.46 gigapixels per second and the bandwidth of 11.7 GB/s are unequivocal. None of the Voodoo5 6000's high-end, single-chip contemporaries would have been able to keep up with it. We will go into this more in the benchmarks.

[Page 3]

Texture filtering

What can we expect from a graphics card in terms of picture quality when its feature set is already seen as outdated at launch? The picture quality is only comparable in a limited sense with that of other graphics cards of the same generation - though this is not necessarily meant negatively.

Trilinear filtering is not a standard fixture of the Voodoo5 using the official drivers, even though the graphics card would of course have been able to do it. Granted, the VSA-100 can only achieve this in one processor cycle with single texturing, but the chip was also able to soften the abrupt LOD transitions with textured surfaces. To that end the drivers have a feature called "mipmap dithering", which works by combining adjacent mipmap levels to generate a sub-image.

The obvious disadvantage to this mode is the dithering itself, which is evidenced by on-screen granularity that is generally quite apparent and cannot be entirely compensated for using high supersampling modes. On the plus side, this mode gives in increase in picture quality with next to no resultant loss in performance (depending on where it's used, e.g. in Quake III Arena).

By far the most interesting mode is a type of bilinear filter that's achieved by running two VSA-100s in supersampling mode, with one chip offsetting the mipmap LOD bias by -0.5. This only works because unlike normal supersampling, with Voodoo5 cards several images can be merged within the T-Buffer with the same balance. The end result is an image with a 1-bit LOD fraction, which by definition passes for a trilinear filter. That said, it should be mentioned that while textured surfaces with a 1-bit LOD fraction produce a noticeable improvement over pure bilinear filtering, it is still not sufficient for consistent linear mipmap interpolation.

This partial LOD-shift is also only possible with SLI antialiasing, so a Voodoo4 4500 is lacking the one thing it needs to achieve trilinear filtering, since it only has one VSA-100 chip and consequently only one LOD bias register. By the same token, however, it also means that a Voodoo5 6000 can cope with 4 images with different LOD biases instead of 2, so instead of one additional mipmap transition you have three, i.e. a 2-bit LOD fraction.

The drivers do prevent one resultant use of this feature: while it's possible with active antialiasing to achieve performance-friendly, optically minimalist trilinear filtering using the original drivers, it only works correctly with a LOD bias of 0.0 or -1, because unfortunately changes to the LOD bias are only applied to the first graphics chip, and the second chip always uses an LOD of -0.5. Thankfully there are modified Glide drivers that also shift the LOD bias on the second graphics chip correctly.

Some of you may be wondering at this point why the LOD bias needs to be shifted at all, since 0.0 is optimal. The reason is simple: whereas rival companies achieved supersampling by increasing the resolution internally (oversampling) and then downsampling afterwards, which automatically gives an increase in texture quality, 3dfx's method did not automatically produce sharper textures, just oversampled texels. Shifting the mipmap LOD bias compensates for this amply. The theoretical maximum shift possible for four SSAA samples is -1.

In practical terms this comes down to a matter of personal taste, as the ideal balance of maximum sharpness and maximum texture stability differs >from game to game. Tests have shown that LOD shifts of -0.5 for 2xAA and -1.5 for 4xAA are reliable. At 4xAA this gives a first mipmap blend at around -2 to -2.5, which is roughly comparable to 4xAF sharpness (albeit without achieving the same texture stability of the latter). This relatively large shift of -1.5 may not be in line with the Nyquist-Shannon sampling theorem, but the theorem applies to the theoretical worst case anyhow in the shape of highest frequency textures (such as a pixel-sized black/white chessboard pattern).

With four supersamples, therefore, if we play it safe, the actual maximum sharpness possible at which you don't have to run the risk of underfiltering is 2x AF [ld(n)=x]. This is only the case for ordered grid antialiasing, however; a less conservative approach would, in addition to the actual sample positions, also take into account the amplitudes of the mipmaps, which in comparison to the base map are necessarily lower and automatically permit the use of a greater LOD shift, at least insofar as the mipmaps are already being sampled. While this did not enable a fourfold "textbook" AF to be calculated, the same was rarely needed given the average texture sharpness in games at that time.

The quest for sharp textures was not triggered by the emergence of anisotropic filters. In fact, a variety of tricks had already been used in Unreal to try and circumvent the limitations of hardware of the day. In addition to detail texture mapping (additional texture overlays used to improve the appearance of surfaces at close range), which is hardly needed nowadays, and macro texture mapping (additional texture overlays used to improve the appearance of surfaces at a distance), which is more likely to still be used today, the Unreal Engine 1.x automatically applied a mipmap LOD bias of -1.5 under Glide.

This means that with bilinear filtering without supersampling, a single screen pixel had to be shared by up to 16 of the base texture's texels. A 1:1 ratio is the ideal, so in effect this amounts to 16-fold underfiltering. By today's standards that would of course be unacceptable, but in 1999 3dfx were praised for the good picture quality in Unreal Tournament, and ironically, the 3dfx accelerators outdid themselves in the performance stakes as well, topping all the benchmarks despite the increase in load from the shifted mipmap LOD bias (although in all fairness it should be noted that the Unreal Engine's design was pretty much perfect for Glide and the Voodoo cards).

Antialiasing

What is it that's so special about the Voodoo5 6000's antialiasing that its quality across the board has still not been bettered by any other manufacturer some 6 years after the demise of 3dfx?

By overlaying several images in the multisample buffer known as the "T-Buffer" it is possible to freely define the AA sample positions within a raster, something that can't be done with simple ordered grid supersampling (OGSS), or oversampling. With this method, the antialiasing is created by a slight displacement of the scenery. Using this "sparse grid", even with just 2 subpixels (i.e. 2xSGSSAA) both the X and Y axes are each sampled twice as accurately, whereas with a square subpixel arrangement (OGSSAA) this requires 2x2 = 4 subpixels (and therefore twice as much load).

A "sparse grid" is a cut-down version, so to speak, of an "ordered grid". The cuts are sensible, mind: while there is a negligible loss of quality in the antialiasing on the most important edges, the corresponding performance gain is considerable. In principle, a 2x OGSSAA can, only upsample one axis and accordingly smooth edges either horizontally or vertically. To achieve an edge equivalent resolution of 8x SGSSA using an ordered grid you have to use 8x8 (= 64x) OGSSAA, which is a good indication that as far as consumers were concerned OGSSAA was more of a token gesture than anything and was only implemented so as to offset technical deficiencies. nVidia matched the texture quality of 64x OGSSAA with the 8x AF on the GeForce3. One of our earlier articles goes into this subject in more detail, covering not just the basics of antialiasing but also, amongst other things, the differences between the different masks.

3dfx is now no longer the only exponent of this method of antialiasing. The R200 (Radeon 8500) was also originally supposed to support rotated grid supersampling, but could actually only do this when no fog was used. S3's 9x SGSSAA mode is the only one that can in fact improve visually on 3dfx's 8x SGSSAA, but this is nigh on unusable as the sample distribution in fullscreen mode appears to be arbitrary, resulting in poor picture quality. Those people who can be persuaded to run games in windowed mode, however, can enjoy the only antialiasing that at the very least matches that found on the Voodoo5 6000.

Of course, both ATI and nVidia have since produced more efficient AA modes, but it is not all that difficult to one up with situations where AAA/TSAA (OpenGL) don't work - once we look beyond the fact that the only smoothing these processes provide is with alpha samples (textures with binary alpha values, i.e. each pixel is either completely solid or completely transparent). The G80's new 8x MSAA mode offers a higher 8x8 axis resolution for polygon edges, but other parts of the image are not processed at all, whereas 3dfx's 8x AA still uses 8 supersamples.

The next alternative to 3dfx's 8xAA was nVidia's 16xS mode using TSSAA, which was introduced with the NV40 (and is currently not available on the G80). Like 3dfx's 8xAA, this resulted in an eightfold increase in axis resolution for alpha samples and polygon edges, but it differed from 3dfx's in that it only gave four texture samples rather than eight (AF can of course be used to make up for this). It should also be noted that the "normal" 16xS mode alone (without TSSAA) could not entirely outstrip 3dfx's 4xAA, as they only display alpha sample edges in the OGSSAA portion at 2x rather than 4x - and that at the expense of a higher CPU load!

Until the introduction of the G70 (and the concomitant introduction of TSAA, which also applied retroactively to NV4x chips) the only option for picture quality fanatics was the highly inefficient 16x OGSSAA, which like 3dfx's 4xAA applied an EER of 4x4 to the whole image. The circumstances of its implementation meant that there were already 16 texture samples in use rather than 4, which in terms of acceptable texture filtering under the high-quality driver settings would have been wasteful. Moreover, of course, this mode required roughly 94% of the available fill rate just for antialiasing, which reduced the performance level of a GeForce 6800 GT down to approximatelythat of a Voodoo4 4500 (350 to 333 MTexel/s). Interestingly, a Voodoo5 6000 (clocked at 166 MHz) using 4x SGSSAA has precisely the same raw performance of a Voodoo4 4500 without antialiasing as well.

[Page 4]

The 22-bit postfilter

To understand the reasoning behind the postfilter nowadays, it must be borne in mind that even in 2000, 32-bit rendering was by no means a given. While all of the graphics cards of the era could manage 32-bit rendering, for the majority it resulted in a hefty drop in performance that for the most part was in no reasonable sense proportionate to the visual improvement. 32-bit rendering was of course promoted by the various manufacturers and desired by nearly all developers, but it was also clear that the increasing graphical demands would eventually be too much for the well-established 16-bit rendering to cope with.

The development of Quake III Arena in 1998 essentially proved that particular point - and how. Q3A was one of the first games that looks markedly better in 32-bit than in 16-bit. Moreover, 3dfx did themselves no favours when their Voodoo3 graphics card hit the shelves in 1999, since it was limited to 16-bit rendering. However, from the outset the Voodoo chips had a postfilter mechanism that was fairly effective in eliminating the artifacts caused by 16-bit dithering.

Up to and including the Voodoo2 this was a 4x1 linear filter, which would, within specified threshold values, simply determine an average value based on four adjacent pixels. This did get rid of the irritating dithering artifacts, but in certain situations it created lines that were clearly visible, although this did not affect the picture quality quite so much as the dithering that was caused by rounding errors under 16-bit. With the Banshee, 3dfx had increased the size of the cache in the RAMDAC, which meant they could incorporate a second line from an image in the filter as well.

The result is the 2x2 box filter, which is usually what is being referred to when talking about 3dfx's "22-bit" rendering. 3dfx themselves talked at the time about approximately 4 million as the maximum number of colours a postfiltered image of this sort could have, which is roughly consistent with 22-bit definition. 3dfx's postfilter was by no means a catch-all solution, of course: while it could smooth out existing artifacts by interpolating four pixels, it could not prevent those artifacts from occurring, which becomes apparent once you introduce heavy alphablending and the high number of rounding errors this causes in visible dithering. This is because the threshold value within which the postfilter works is exceeded in such cases, meaning that the dithering artifacts are left untouched.

Another flaw is that the filter is unable differentiate between a dither pattern caused by 16-bit rendering and one that is a result of a texture's desired structure when smoothing. Accordingly, there are also instances where the box filter can have a negative impact on a texture's design. This effect might be quite practical for mipmap dithering, but the dither pattern was often so intense that the threshold for postfiltering was exceeded and as a result it couldn't be smoothed either. In practice, however, the postfilter did its job so effectively that for a long while 3dfx users were unable to reproduce the described problems with 16-bit rendering, while the output from other cards had clearly degenerated because of it.

Even with the introduction of native 32-bit rendering, the postfilter became even more significant when it came to the Voodoo5, as Voodoo5 users were for the most part inclined towards changing a setting that would allow them to enjoy the unrivalled AA modes. The outcome was that 32-bit rendering would be ignored in favour of high performance at the highest possible resolutions, which in most games of the time was achieved with only minimal visual impact.

So the performance loss that comes from activating the postfilter, while measurable, was in reality not noticeable for the most part. The 3dfx supersampling then has the effect of oversampling the on-screen pixel, which reduces the 16-bit dithering considerably before the postfilter is applied in the RAMDAC. This was in fact so effective that 3dfx deactivated the postfilter completely in 4xAA and still managed to conjure up a 22-bit on-screen image that was entirely devoid of artifacts.

Z-buffer accuracy

There is one downside of 16-bit rendering, however, that neither the postfilter nor supersampling could address: accuracy issues with the z-buffer. The ever-lengthening draw distances and increasingly detailed textures meant that screen depth was also becoming a demand on the accuracy of the z-buffer. While Glide, which used a non-linear quantisation of depth information not too dissimilar to the w-buffer, for the most part remained unaffected by precision problems, people playing newer games that used a 16-bit z-buffer accuracy had to contend with polygon shimmer, also known as "z-fighting". This, too, can be remedied, however - for example, w-buffering is available in the game Mafia, which gives more measured accuracy by implementing a modified interpolation, completely eliminating the z-fighting in some instances.

At this juncture we need to take a moment to mention "WickedGL", the OpenGL Glide wrapper developed by Metabyte, which we used ourselves for our benchmarks. In addition to enabling the user to force allowing texture compression, the wrapper's sleek and speedy API (it weighs in at only 340kb, compared to 320kb for the last official Glide3x driver) can use Glide, which is quite capable of satisfactory 16-bit precision under normal circumstances, to provide perfectly adequate 16-bit precision for OpenGL games as well, often removing the only remaining advantage of 32-bit.

Picture quality: the consequences

Overall it's evident that 3dfx didn't make life particularly easy for themselves. For a long time the end user just had to blindly accept the overly enthusiastic claims of 3dfx's marketing department or get an idea of the picture quality first-hand, as it was simply not possible to take screenshots that included the postfiltering. This meant that eager hardware sites such as 3DConcept were left to twiddle their thumbs until changes were made to the HyperSnapDX software late in the day that allowed them to go into the topic in more depth. By then, however, it was already too late to do anything about the glut of fabricated screenshots that had by then already been published along with every Voodoo card up to and including the Voodoo3. Furthermore, the software was never adapted to work with the Voodoo5, which resulted in screenshots of active 2x antialiasing that showed obvious color banding which was not actually visible on the screen.

There was also another problem, albeit one that only manifested on the Voodoo5 6000: it is virtually impossible to correctly depict any antialiasing mode under Direct3D in screenshots. It was only possible in Glide and OpenGL games (either natively or with the WickedGL wrapper). The mode also suffered from another problem, namely that it applied a partial custom mipmap LOD to bilinear filtering with supersampling, meaning there was absolutely no way of capturing the true display quality of a Voodoo5 6000 in screenshots when using 8x SGSSAA and a LOD of -2.

Because of this, all the 8x SGSSAA screenshots in this article have a LOD of -1, which is roughly equivalent to 2xAF. In actual fact there is no reason why you can't set the LOD to -2 in the tested games at this high antialiasing mode, which due to the simple trick of LOD shifting to the second chip as explained above approximates the sharpness level of 5xAF. This means that the Voodoo5 6000's can produce pictures of a quality that is still eye-catching even by today's standards - with an associated processor load that more often than not has a significant impact on end performance. Still, in extreme circumstances (alpha samples galore, say) even an a Radeon X1950 XTX can consume up to a tenth of the overall performance when running at 6x AAA + ASBT, but a performance hit like that is still tolerable so long as things remain playable.

By contrast, nVidia's G80 chip comes with no such power-hungry AA modes. While the 16x OGSSAA mode that was still - unofficially - available on the G70 cards was a massive performance hit, leaving just one sixteenth of the original fill rate free for actual rendering, the G80's antialiasing repertoire is presently headed by its 16xQAA multisampling mode with TSSAA. While this is considerably more efficient in terms of the performance cost for image enhancement, not all of the screen's contents are processed.

In addition to polyon edges, the antialiasing samples only enhance alpha samples; textures and pixel shaders are free to flicker on their merry way. Because of this, the G70 must be considered as having a higher maximum possible picture quality overall at this time, even though this is only achieved in older games, or at framerates that are now less than acceptable. To put it another way, this means that a non-mipmapped scene (whether because of a design error or oversight, as is so often the case with console ports) would be all that was needed to make today's state-of-the-art graphics cards look worse than 3dfx's 6-year-old ;).

It is really the T-buffer that represents the focal point of this entire discussion of picture quality. Sadly its practical uses for antialiasing have been exhausted, but if the T-buffer's capabilities were utilised consistently the overall cost of improved picture quality would in turn be much lower, because soft shadows, soft reflections, motion blur and depth of field or antialiasing could all be calculated in a single cycle (antialiasing and depth of field are fullscreen effects, so they each need a frame buffer of their own), meaning there would be no additional processor load (apart from the complex shifting of vertex coordinates). However, there was no API extension available at the time that would have allowed developers to access these features directly: the transformations had to be done manually. A modular T-buffer engine would therefore have had to be written for these features to have found their way into commercial games.

Mr. Lolman

2007-04-24, 17:31:34

[Page 5]

Test machines, benchmark software and settings, and the competition

Anyone who has spent any time looking into the subject, or is perhaps even looking at getting hold of a Voodoo5 6000 of their own, will undoubtedly be aware of the number one issue, known in layman's terms as "the motherboard". Buying a motherboard for modern day cards can be frustrating enough, but when you're looking for one with a Voodoo5 6000 in mind it becomes infinitely more complicated. The card is - and remains - a prototype, and that hurdle crops up constantly. Choosing the right host for a Voodoo5 6000 is actually the biggest problem of all, because only a fraction of motherboards actually work with it.

For one thing, as with all of the Voodoo5 cards that were officially released, it only fits in AGP 2.0 capable slots, and therefore needs 3.3v voltage. This is further complicated by a problem that is properly known as "latchup": if you put a Voodoo5 6000 in a motherboard that supposedly has a compatible AGP slot and chipset but is not included on the established compatibility lists, there is a high probability that the HiNT PCI Bridge will overheat in seconds and keel over. This becomes even more unpleasant when you realise that this anonymous component is used in the intercommunication between the four VSA-100 GPUs; without it, the screen stays blank. Why this bug occurs, and the extent to which it could have been contained had the card proceeded to a proper release, remains a mystery.

[The HiNT HB1-SE66 PCI-PCI bridge is the link between the four graphics processors and the AGP slot. It tricks the AGP slot into thinking it is housing a single PCI device, which then runs in PCI66 (AGP1x) mode. (Image: © PC Games Hardware)

Of the 3dfx-compatible chipsets remaining, VIA's KT333 is about as good as it gets when putting together the most powerful Voodoo-PC possible. The chipset supports Athlon XPs builtin on a Barton core with a 166 FSB, so it can handle some of the fastest Socket A processors. Caveat emptor: The KT400 and all newer chipsets have moved on to AGP 3.0, which means they cannot be used! Our benchmark machine incorporates an MSI KT3 Ultra2 with Mobile Athlon XP (Barton core) 2600+ and 2x 512MB PC3200 memory sticks, making it the best available system for an AGP Voodoo.

You may ask why we are pairing a graphics card from the year 2000 with components from some 3 years later, and the answer is CPU scaling. The comparatively powerful core system makes it possible for us to carry out a direct comparison of a contemporary high-end CPU with the latest hardware available for use with graphics cards in 2003. The fastest available CPU in 2000 was a 1.2GHz Athlon Thunderbird, whereas our <i>ne plus ultra</i> in this test will be a Barton overclocked to around 2.6GHz.

As we don't have a Thunderbird available, we instead used an approximate simulation by running half of the tests with the Barton underclocked to 1 GHz. This was because we wanted to see which graphics card benefits the most from a significantly faster processor. One of the incentives we had in mind when doing this was the GeForce's hardware T&L support - which was activated in our tests - and whether the lack of this could be compensated for by higher processor power.

The test system consisted of the following:-

* Mobile Athlon XP (Barton core) 2600+ at 1.0 and 2.6 GHz
* MSI KT3 Ultra2 (VIA KT333)
* 2x 512 MB PC3200/DDR400
* Windows 98 SE & Windows XP

Our goal as far as the benchmarks were concerned was clear: to finally compile a comprehensive picture of the performance capability of 3dfx's flagship. As well as the different CPU speeds, this also required a wide selection of games that, in addition to the blockbusters of the day, also shows how newer games are affected by the effective moratorium on driver development. This raised an important question: should we use Windows 98 SE or Windows XP as the test operating system? We chose Windows XP on account of it being more up-to-date.

Nevertheless, when considering the data that follows it should be borne in mind that the Voodoo drivers were specially optimised for Windows 98. There are no official XP drivers, plain and simple, because 3dfx shut up shop some 3/4 of a year before XP was released. We are thus in effect testing beta hardware with alpha drivers. So why choose XP? The answer boils down to 4 letters: SFFT. Far from being some strange and exotic drug, SFFT is actually a bedroom programmer cum 3dfx enthusiast from London who has taken the driver source code that found its way onto the Net and for the last two years or so has been working on his "SuperFurryFurryThing" alpha drivers, originally a Direct3D Core for Windows 2000.

It's thanks to him that present-day Direct3D games start at all, and some bugs were eliminated. Some modders did try to make Windows 98 drivers, but this never went any further than some minor tweaks. Still, our figures do show that the drivers are far from perfect, and where there are major discrepancies we have also decided to provide reference figures for Windows 98. In short, the driver situation as far as we are aware can be summarised as follows: Windows XP is more compatible with new games using SFFT, whereas Windows 98 is often quicker with contemporary software.

When it comes to OpenGL we are using the rapid WickedGL, an OpenGL to Glide wrapper that we touched on earlier. The wrapper boosts Voodoo cards to hitherto unforeseen levels of performance in openGL with no visible loss in quality. The competing Radeon and Kyro cards were each used in conjunction with the latest drivers available for them, but this was not the case for the GeForce cards, as the ForceWare 91.47 drivers that we initially tried turned out to have a severe impact on performance. Our search for a a driver that was relatively recent but still fast eventually led us to settle on ForceWare 71.84, which also happens to be the last driver to officially support the GeForce2 GTS and Ultra.

In summary the drivers were:

* Voodoo: SFFT Alpha 41 (Direct3D) & WickedGL 3.02 (OpenGL)
* GeForce: ForceWare 71.84 WHQL
* Radeon: Catalyst 6.5 WHQL
* Kyro 2: v22.0033 Beta

Our plan was to pit the Voodoo5 6000 against nVidia's then top graphics card, the GeForce2 Ultra, so we had to include that as it was the most relevant counterpart to the Voodoo. We didn't stop there, however: we wanted to know how the Voodoo5 6000 would have fared against the GeForce3 and the subsequent Radeon 8500 if Rampage had seen a late release in Spring 2001. And would it have even stood up against a GeForce4 Ti4200? In the other direction our card selection went all the way down to the original 256 and the Voodoo4 4500.

The test cards in full

* Voodoo5 6000 AGP (4x 32 MB, 183/183 MHz)
* Voodoo5 5500 AGP (2x 32 MB, 166/166 MHz)
* Voodoo5 5500 PCI (2x 32 MB, 166/166 MHz)
* Voodoo4 4500 AGP (32 MB, 166/166 MHz)
* GeForce4 Ti4200 (128 MB, 250/222 MHz)
* GeForce4 MX440 (64 MB, 270/202 MHz)
* GeForce3 (64 MB, 200/230 MHz)
* GeForce3 Ti200 (64 MB, 175/200 MHz)
* GeForce2 Ultra (64 MB, 250/230 MHz)
* GeForce2 GTS (32 MB, 200/166 MHz)
* GeForce2 MX (32 MB, 175/166 MHz)
* GeForce 256 SDR (32 MB, 120/166 MHz)
* Radeon 8500 (64 MB, 275/275 MHz)
* Kyro 2 (64 MB, 175/175 MHz)

The following 8 games were used to obtain our benchmarks:

* Dungeon Siege 2
* Heavy Metal FAKK2
* Max Payne
* Max Payne 2
* Need for Speed: Porsche
* Serious Sam: The Second Encounter
* Ultima IX
* Unreal Tournament
* Unreal Tournament 2004

We tested our benchmark games using the following settings, both at 1.0GHz and 2.6 GHz:

* 1024x768x16
* 1024x768x16 + 4xAA
* 1024x768x32
* 1024x768x32 + 4xAA
* 1600x1200x16
* 1600x1200x32

We couldn't run all of these settings on all of the games, of course: Max Payne 2 only runs in 32-bit, for example, whereas Unreal Tournament 2004 is too jerky at 1600x1200 (and only ran in 16-bit on the Voodoos anyhow) and Dungeon Siege 2 only went up to 1280x1024 on most of the test cards. In addition, the nVidia cards up to and including the GeForce2 couldn't do 4xAA in some games, so we benchmarked those games at 2xAA instead, although we also tested at 4xAA with the cards that could manage it.

N.B.: "4xAA" for the Voodoo cards means 4xRGSSAA with an LOD of -1.5, and for the GeForce cards up to and including the GeForce2 Ultra it means 2x2 OGSSAA plus 2xAF. This gives us a comparable level of quality: the Voodoos produce significantly smoother edges, but when it comes to textures it's the GeForces that have the slight edge (as explained above). The 2xAA setting comprised 2xRGSSAA/LOD -0.5 for the Voodoos and 2xOGSSAA/2xAF for the GeForces. Not wanting to completely neglect the MSAA-capable nVidia cards, i.e. the GeForce3 onwards, these were tested using 4xS antialiasing and 2xAF or 2xRGMSAA/2xAF. When looking at the numbers, therefore, remember that picture quality also comes into play!

[Page 6]

Dungeon Siege 2 benchmarks

* Game version: 2.20
* Shadows: Einfach
* All other settings maximum

{ Things to translate in images: Zwischensequenz = cut scene, change comma in CPU speed to a period, "Screenshots GeForce2" (for example) should be switched to read "GeForce2 screenshots" }

[Page 7]

Dungeon Siege 2 benchmarks (cont.)

Dungeon Siege 2 represents a crushing defeat for the entire 3dfx range. The Voodoo cards clearly had problems, even though the graphics from this 2005 game were not especially demanding. Several things are apparent from this: for one, neither the pure texel performance nor the bandwidth appear to be the primary limiting factors; and for another, CPU speed has a massive impact on framerates, whereas hardware T&L enables the GeForces to handily avoid this limit.

Comparing the Voodoo5 5500 AGP and PCI and their younger sibling, the Voodoo4 4500 AGP, also reveals something that has already been highlighted here[Link] and here[Link]: the graphics bus is a bottleneck. In 800x600x16 the Voodoo4 AGP is hot on the heels of the Voodoo5 5500 AGP, which is in effect twice as powerful, and leaves the Voodoo5 5500 PCI trailing behind! This is further confirmed by the fact that the only way the Voodoo5 6000 can pull out a significant advantage is with antialiasing; otherwise it is only marginally faster.

Significant driver weaknesses and a full VRAM start to become evident once we hit 1280x1024 in 32-bit, at which point the Voodoo cards drop down to a sixth of their 16-bit performance. SLI problems are once again the biggest hurdle for the Voodoo5 6000: even the Voodoo4 outpaces it, and the Voodoo5 5500 cards are nearly twice as fast at 1280x1024x32. Unfortunately we couldn't repeat the tests under Windows 98 SE as the game refuses to run on it.

In contrast to the benchmarks, when it comes to picture quality the 3dfx cards more than hold their own in Dungeon Seige. There is one drawback, namely our LOD setting of -0.5 for 2xAA - as with Max Payne, the image is redithered (or maybe even trilinear filtered?), so the sharpness values that are calculated from the picture quality portion are unnecessary. Judging by the screenshots a value of -0.75, or perhaps even -1, would have remained quite acceptable. That aside, we think it best to the screenshots speak for themselves :).

[Page 8]

Heavy Metal FAKK2 benchmarks

* Game version: 1.02
* Model detail level: Standard
* Texture filter: bilinear
* All other settings maximum
* EAX 2 activated

{ PCGH-Timedemo = PCGH timedemo }

[Page 9]

Heavy Metal FAKK2 benchmarks (cont.)

Hardware T&L gives the GeForce cards a lead here at 1024x768 that remains undented even with the CPU overclocking to the highest speed possible. As the load balance moves in the direction of the graphics card, though, the Voodoo5 6000 becomes more and more prominent, and by the time we get to 1600x1200x16 it is neck and neck with the GeForce2 Ultra with the CPU at 1.0GHz. Once the 2.6 GHz enters the fray it's giving the GeForce3 a run for its money. Incidentally, we tested using the standard model details and bilinear filtering. The former was a result of the fact that it's the highest detail level that non-T&L cards show, and we set the filter to the Voodoo cards' maximum, bilinear filtering, so as not to put anyone at a disadvantage.

Heavy Metal FAKK2 is one of those rare games that can capture 3dfx's 8xAA. However, the "overblending" problem also occurs here, which is why we have included a screenshot without mipmaps in addition to one with a LOD of -1. Still, Heavy Metal FAKK 2 really is a joy to play at 8xAA, with ~5xAF sharpness using a LOD of -2 and a 2-bit LOD fraction - but as is often the case, this doesn't show up in screenshots. the 4xAA/-1.5 LOD screenshots created using a Voodoo5 5500 PCI do give some indication of the level of texture quality that can be expected.

[Page 10]

Max Payne benchmarks

* Game version: 1.05
* Full details
* 16-Bit textures enabled in 16-bit
* 32-Bit textures in 32-bit colour depth

Let's get straight to the point: Max Payne's first outing in the underworld does not play nice with Voodoo cards under Windows XP. While the Voodoo5 5500 has about the same performance as the GeForce2 MX, the Voodoo5 6000 would be well advised to lock horns with the GeForce2 GTS rather than the GeForce2 Ultra. This is further exacerbated by the familiar SLI problem at 1600x1200, where the star of our show stumbles in rank last. At both 1024x768x32 with 4xAA and at 1600x1200x32 the Voodoo cards also have an overloaded VRAM to deal with, which betrays itself from time to time as white surfaces.

[Page 11]

Max Payne benchmarks (cont.)

Our tests in Windows 98 SE reveal that the XP drivers are to blame for a considerable drop in performance. The resultant values below give an indication of how it plays out under Windows 98 SE:

[Page 12]

Max Payne benchmarks (cont.)

[Page 13]

Max Payne benchmarks (cont.)

The same Voodoo5 6000 that is so easily beaten in Windows XP is roughly equivalent to a GeForce2 Ultra in a Windows 98 environment, although it does lose out in the majority of tests where the CPU is ramped up to 2.6GHz. It is also evident that the GeForce2 Ultra is not noticeably quicker under Windows 98, and the results are about the same.

Although bilinear filtering was used during the performance tests for the sake of parity, Max Payne is also a great example of the necessity for trilinear filtering, hence the comparison screenshot with this activated. The above LOD calculations do level the playing field a bit when using 3dfx's mipmap dithering, however; as such, a screenshot with a LOD bias of -0.5 is as sharp as it would be if it used trilinear filtering (~1,4xAF), which can quite clearly be seen from the 3dfx 2xAA screenshot.

[Page 14]

Max Payne 2 benchmarks

* Game version: 1.01
* Mirrors: Off
* Post-processing: Off
* Pixel shader skins: Off
* All other settings: High/On

Upping the CPU speed gives the pride and joy of the 3dfx fleet a boost of around 100% at 1024x768, putting it on a par with the GeForce2 Ultra. Even the Voodoo5 5500 is able to keep the rival GeForce2 GTS in check. Once the additional 4x supersampling is enabled, the Voodoo5 6000 leaves the rest of the field standing. Note that the GeForce3 and GeForce4 MX/Ti always have to render a single 2x OGSS portion plus 2x multisampling (4xS)! Our initial attention should really be focused on the framerates achieved with the CPU at 2.6GHz, since as our timeline shows, the game was released at the end of 2003 when a CPU of that speed was available.

By stark contrast, the values for the Voodoo5 6000 at 1600x1200x32 are downright terrible. Unfortunately none of the drivers we tested were able to display a resolution higher than 1024x768 at a passable speed. The problem seems to seriously obstruct the SLI communication between the four GPUs - to such an extent, in fact, that even the single-chip Voodoo4 4500 is faster. Fortunately Glide and OpenGL do not seem to be thus afflicted.

The same applies to Max Payne 2 as to Max Payne 1: increased picture granularity caused by mipmap dithering and the correspondent negative impact of a LOD shift make increasing the LOD shift with the Voodoo cards undesirable. -0.75 would still be acceptable in many scenes at 2xAA, but to achieve a level of sharpness comparable to 2xAF you would need a LOD of -1, which by today's standards would be unacceptable in most situations.

[Page 15]

Need for Speed: Porsche benchmarks

* Game version: Unofficial Enhanced + Chrome Patch
* Full details
* 16-bit only, as some cards can't manage 32-bit
* 3D sound activated

Rinse and repeat: the Voodoo5 6000 has SLI problems at 1600x1200 - in fact, the Voodoo5 5500 is faster. On the flipside, the Voodoo5 6000 is way ahead of every other card at 1024x768 with 4xAA, so much so that we are at a loss to explain the massive slump on the part the GeForce4 Ti. The GeForce4 MX is very sluggish throughout, though this could always be reproduced. For the record, we tested all cards in Direct3D and installed the latest (unofficial) patch as well as a chrome effect for the cars. There was also a (somewhat quicker) Glide mode available for 3dfx cards, but since this refuses to cooperate with Fraps we weren't able to take any measurements. Furthermore, we couldn't get 1600x1200x16 working with the Radeon 8500, but otherwise it shone.

[Page 16]

Need for Speed: Porsche benchmarks (cont.)

[Page 17]

Need for Speed: Porsche benchmarks (cont.)

If it weren't for the SLI problems the Voodoo5 6000 would be numero uno across the board, with the 4xAA scores being particularly impressive. Surprisingly the GeForce2 Ultra also benefits from running under the alternative operating system, though it's fair to assume that the Windows XP drivers have been thoroughly optimised in the meantime.

Benchmarks might have been an impossibility, but we were able to take screenshots of the 3dfx cards in Glide mode, which was far preferable in terms of performance. Well, in 16-bit at least, but it was worth the effort: NFS Porsche is the third of our chosen games where we were able to retain more than just a memory of 3dfx's 8xAA mode. However, our joy at the unsurpassably lovely edges (the adaptive AA methods do rather come to grief on the windscreen wipers) was somewhat dampened by one minor issue: yet another mipmap shift problem. We are again given to recognise that it was never possible to fully optimise the Glide drivers to cater for the Voodoo5 6000. As for the Radeon 8500, the reason that its screenshot (self-evidently) differs from all the rest is that the scrolling menu screens were usurped by blank surfaces, reducing track and car selection to little more than "hit and hope".

[Page 18]

Serious Sam: The Second Encounter benchmarks

* Game version: 1.07
* "Normal" graphics setting

{ In images, Tal des Jaguars = Valley of the Jaguars }

[Page 19]

Serious Sam: The Second Encounter Benchmarks (cont.)

The story here is exactly the same as with Heavy Metal FAKK2: at 1.0 GHz the GeForce cards wipe the floor with all of the other cards when benchmarking at 1024x768 with antialiasing disabled. At 1600x1200x16 the Voodoo5 6000 does finally catch up with the GeForce2 Ultra and then goes on to trounce it in 32-bit and at 1024x768 with 4xAA. With the CPU at 2.6 GHz the GeForce2 Ultra does retains a slender 11% lead in 1024x768x16, but in every other instance it has to admit defeat. The GeForce3 only just outclasses the Voodoo5 6000 at 1600x1200, but the nVidias GeForce4 Ti remains in a different class throughout.

Serious Sam: The Second Encounter is another game in which screenshots of the 8xAA mode can be taken. However, such a screenshot would be an adulterated version of the real quality either because of rainbow colours or simply because of the mipmap problem already revealed by FAKK2. Serious Sam: TSE also doesn't correctly recognise WickedGL's multitexturing capabilities, meaning that mipmap blending is <i>verboten</i>, as can be seen from the 2xAA screenshots for the Voodoo5. Based on the benchmark results, though, we can only assume that multitexturing is in fact active.

[Page 20]

Ultima IX benchmarks

* Game version 1.18f
* Visibility tweaked to 10,000
* 16-bit: 16-bit textures
* 32-bit: compressed textures
* Voodoos: Glide, Others: Direct3D

[Page 21]

Ultima IX benchmarks (cont.)

Without supersampling the Kyro II produces astonishing performance with the CPU clocked at 1 GHz considering its theoretical raw performance, even though it's displaying absolutely everything. Upping the CPU speed to 2.6 GHz lay bare its limitations, however, which are often far short of the competition's - given the chance, the best of the cards on test could translate CPU speeds of 3 or more GHz into a performance increase.

Apart from the different colour scheme under Glide, the other thing that immediately stands out is the perfectly smoothed hanging bridges in the Kyro screenshots. The Kyro driver is automatically applying alphablending here, though the flipside is that it also results in flawed edges on the leaves. unfortunately we were unable to take an error-free 3dfx 8xAA screenshot. Newer Glide2 drivers do enable screenshots to be taken that don't have the incorrect colours, but they also do not show transparent textures in-game, which would indicate an incompatibility of some sort with the chromakeying (a type of alpha sample).

[Page 22]

Unreal Tournament benchmarks

* Game version: 4.36
* Full details
* EAX activated
* APIs: Glide (Voodoos), Direct3D (others)
* OpenGL suppresses the texture detail, so it's irrelevant

[Page 23]

Unreal Tournament benchmarks (cont.)

A resounding victory for the Voodoo5 6000: with Glide it keeps apace with the GeForce3 throughout, and surprisingly even outclasses the GeForce4 Ti at 4xAA. The GeForce2 GTS seems to run out of VRAM at 1600x1200 and at 4xAA when switching to 32-bit. We're also not really sure why the GeForce3 cards are so slow despite the 4xS, but we were able to reproduce the framerates consistently.

Several things are evident with Unreal Tournament. For one, it's another game that lets you take correct 8xAA screenshots. There are also some peculiarities with the colour scheme: apparently each graphics chip apparently interprets the correct way to apply gamma values in the game differently. The image in 16-bit is significantly lighter throughout than in 32-bit. For the 3dfx cards this only really becomes apparent screenshots: all of the 32-bit Glide screenshots are much too dark, and adjusting the brightness after the fact would not only be verboten given for a qualitative image comparison, it would also cause massive colour banding. Since you can hardly see a thing in the unretouched 3dfx 32-bit screenshots we've left them out entirely, but in-game at least it looks normal.

[Page 24]

Benchmarks Unreal Tournament 2004

* Game version: 3369
* Full details
* only at 1024x768, as 1600x1200 is frequently unplayable
* only 16-bit, as the Voodoo drivers won't run in 32=bit

Although Unreal Tournament 2004 is relatively current compared to the rest of the test games, the Voodoo5 6000 still puts in a good performance as long as it's coupled with a powerful CPU. The game's publication date (early 2004) allows for this. With antialiasing disabled it can compete with the GeForce2 Ultra and GeForce3; once 2xAA is enabled it skirts the domain of the GeForce4 Ti4200 (2xMSAA + 2xAF). There is one small mark against it, though: regardless of the settings used, the Voodoos only ever display in 16-bit. Still, this is barely noticeable in practice, as it is hard to differentiate between that and 32-bit. The Kyro II, meanwhile, does produce some texture errors.

Those of you who are keeping an eye on the numbers have (in addition to the lack of 1600x1200 benchmarks on performance grounds) no doubt noticed something else: the GeForce3 is unable to pull away from its predecessors, but the GeForce4 Ti far outstrips the rest of the test cards. Our assumption is that Unreal Tournament 2004's high polygon count means that it gets a real kick from the GeForce4 Ti's increased vertex throughput, which in our case corresponds to a factor of 2.5 (compared to the bog-standard GeForce3).

Unreal Tournament 2004 is a good example of the problems that occur when you have extensive alphablending in 16-bit. Each graphics cheaper has their own ideas about how to tackle the issue of insufficient precision. The advantages of Kyro's internal 32-bit rendering are also quite apparent. Auffällig ist mit 16 Bit Farbtiefe eine Grünverschiebung bei den Voodoos, die jedoch nur unter Direct3D aufzutreten scheint, sowie eine Violettverschiebung der GeForce2-Karten. Weiterhin stellt 3dfx die Waffe am schönsten dar, während sich die Kyro das Beleuchten derselbigen gleich komplett spart. Mit letzterer trifft man ohnehin (je nach Level) mehr oder minder schwere Textur-Abstinenzen an.

Raff

2007-04-24, 18:37:50

Jetzt zwick mich doch mal einer ... das ist ja phänomenal! Der Kerl kann echt Englisch und zieht es sogar durch. :eek: =)

Danke an targaff auch hier (sofern du das liest)!

MfG,
Raff

Nvidia5

2007-04-25, 15:26:11

Mal ne Frage @Raff.
Was isn jetzt mit deiner V6000, schon wieder heile zuhause?
@Topic, bin leider zu faul das genze Englisch zu lesen. Kann wer eine für ihn wichtigen Teil in DE übersetzten.:biggrin:

Raff

2007-04-25, 16:45:30

Nein, sie ist leider noch in der Reha, bei Hank. Ich vermisse sie ... ;( ;)

MfG,
Raff

Hakkerstiwwel

2007-05-10, 23:06:22

Jetzt zwick mich doch mal einer ... das ist ja phänomenal! Der Kerl kann echt Englisch und zieht es sogar durch. :eek: =)

Danke an targaff auch hier (sofern du das liest)!

MfG,
Raff
tut mir leid deine Euphorie zu bremsen, aber da muesste noch einiges umgeschrieben werden, leider fehlt mit momentan die Zeit dazu.

Raff

2007-05-11, 23:02:03

Ich hab's nur überflogen. Was ich sah, klang exzellent.

MfG,
Raff

Gast

2007-05-21, 20:07:05

Hiya,

Sorry I haven't got the last bit finished, I'll get to it soonish, but I'm a bit busy right now.

Hakker: if there is anything that is factually incorrect, particularly with regard to the the technical aspects at the heart of the article, then please do let me know - my e-mail is targaff@datacomm.ch. I'm less interested in matters of style: the source was fairly informal and I've broadly tried to keep to that, but at the end of the day it sometimes comes down to "what works for you" and I haven't really got the time or energy to pick over minor issues that this might cause to crop up.

Regards,

Targaff

Hakkerstiwwel

2007-05-23, 08:36:15

Hi Targaff
first: after I've read the whole translation, I must say that you did a fine job. The technical part is well done, and no: there are no errors in this part. What struck me a bit; and remember: the starting paragraphes of an article decide whether the reader keeps on reading the article or not; was the style used in the first five paras.
second: maybe my wording was formulated a bit too strong, but I actually stopped reading the translation after these few first sentences, as I got annoyed about the style used(sounded a bit like a google translation). In what concerns the following paras they are very well done.
My message to you is: even if you are not so much concerned about style, if you wish that the article receives the attention which it clearly deserves, please dedicate some time to the opening phrases of your work. Should I get some free moments over the next couple of days, I am happy to send u a reworked introduction.
Cheers

Mr. Lolman

2007-05-23, 09:09:49

The development of Quake III Arena in 1998 essentially proved that particular point - and how

Quake was indeed developed in 1998. But it did not prove the particular point until 2. december 1999 - it's release date. It was only John Carmack, who made 32bit-propaganda in 1998.

aths

2007-06-05, 22:40:42

Es ist doch normal, dass JC schon vor dem Verkaufsstart gerne eine Verbreitung von 32-Bit-Grakas sehen würde. Verbreitung kostet ja auch Zeit.

Raff

2007-07-08, 13:06:47

Any News? =)

MfG,
Raff

Gast

2007-07-25, 21:11:02

keiner will mehr was tun
erst übersetzende 30 seiten dann is ruhe

Gast

2007-09-03, 23:53:26

{Part continued}

Unreal Tournament 2004 is a good example of the problems that occur when you have extensive alphablending in 16-bit. Each graphics cheaper has their own ideas about how to tackle the issue of insufficient precision. The advantages of Kyro's internal 32-bit rendering are also quite apparent. There is a marked green shift in 16-bit mode with the Voodoos, but this only seems to occur in Direct3D, and a violet shift with the GeForce2 cards. The best-looking weapons are found on the 3dfxs, but the Kyros don't cast any light on them at all, and there are varying degrees of missing textures besides, depending on the level.

[Page 25]

CPU scaling and averages

Based on these figures, our subject card can claim several victories, but must also admit to several defeats. We have come up with some data that give an overview of who, on average, is setting the pace on average. On the one hand these show the average framerates across all games as well as the percentage gain achieved by increasing the CPU speed. We didn't want to overwhelm you with figures a second time, however, so we've limited ourselves to the cards from this article that hold the most interest: the Voodoo5 6000; its main competitor, the GeForce2 Ultra; and Nvidia's successor, GeForce3.

{ Image text: CPU-Skalierung = CPU scaling, Durchschnitt = average, change commas to periods, alle Settings = all settings, nur AA = AA only }

The Voodoo 5 6000 is found wanting in this comparison, even though the average FPS incorporates the values from its Windows 98 tests. It is good to see that it benefits most from more CPU performance at 1024x768, putting it almost on a par with the GeForce3. At 1600x1200, though, the SLI bug rears its ugly head: the below-ground values from Dungeon Siege 2 and Max Payne 2 ensure that the excellent impression gained from contemporary titles drops off when things are averaged out.

It does, however, almost equal the level of the GeForce2 Ultra, though it is obvious that a differentiation has to be drawn between new games and old: with the former, the GeForce cards clearly benefit from their up-to-date drivers, while the Voodoos can only watch from the sidelines; but if we limit ourselves to those games that were available when the cards were released, the picture is quite different, especially at 1600x1200.

[Page 26]

Benchmarks at 8x SGSSAA/LOD -2

This section is devoted to the Voodoo 5 6000's outstanding edge smoothing capabilities. Would 8xAA have always been usable in games of that time? And if so, at what resolutions? And how much is left in the tank for future games? This is shown by the figures below:

It is clear that the raw performance would have been adequate for thoroughly acceptable framerates at 800x600x16. That may seem weak at first glance, but even at that resolution, 8xAA is all you could wish for. Higher resolutions might also have been attainable - using the similarly impressive 4xAA.

Conclusion

Gast

2007-09-04, 02:26:17

Conclusion

So, there we have it - the Voodoo 5 6000's full range of performance. However, it isn't possible to arrive at a definitive conclusion, so we will firstly explain some partial results.

The fact that 3dfx's flagship card would not have had an outright advantage, and that the introduction of the Voodoo5 6000 would not, therefore, have led to 3dfx (re)gaining the undisputed performance crown cannot be ignored. But why was that? Simple: the Voodoo5 6000 lives and dies by its surrounding infrastructure and the game settings used. Above all it comes down to the lack of hardware T&L. Our test games from 2000 and 2001 demonstrate that at 1.0 GHz, that gamers who preferred lower resolutions and higher framerates would've been better off with a GeForce2 card, which were unassailably fast in such conditions, whereas the Voodoo5 range often had to wait for the CPU.

In fact, the Voodoo5 6000 has such raw performance that with the (simulated) high-end CPU from 2000, the dropouts in high 32-bit resolutions were few and far between. Its then enormous 11.7GB/s bandwidth and effective usable 1.46 Gigapixels/s meant that by 1600x1200x16 it was able to match the GeForce2 Ultra, and consistently outclass it at 1600x1200x32. In fact, at that resolution the gulf between the two cards was occasionally vast, for example in Heavy Metal FAKK2 and Unreal Tournament.

This gap was only closed with the advent of the GeForce3: its major architectural improvements, and in particular the crossbar memory controller, put it streets ahead of its immediate predecessors. But amazingly it was not able to defeat the Voodoo5 6000 consistently, and the 6000 even puts the bottom-end GeForce4 Ti under pressure in some instances. It is also apparent that the Voodoo5 6000 scored points in precisely those areas on which graphics cards are judged even today: high resolutions and maximum possible picture quality.

As expected, the Voodoo5 6000 also gains the most from a CPU upgrade. While the Voodoo5 5500, GeForce2 GTS and the other, less-powerful cards are almost at full capacity with a CPU speed of 1 GHz, 3dfx's "Quad SLI" card can achieve appreciable framerates in the most demanding setups. The figures essentially evidence what we were trying to achieve: the advantage of the GeForce2 range shrinks considerably at 2.6 GHz. The Voodoo5 6000 would have had to come to terms with the same problem back then as nVidia's GeForce 8800 GTX (SLI) is having to deal with presently: only very high resolutions and/or a significantly overclocked CPU reflect the true potential. Thus the Voodoo5 6000 was, paradoxically, "future-proof" - paradoxically because this is not what one expected given the lack of various features.

The current games also highlight the impact of time in the driver department, with SLI problems and incompatibilities increasing the deficit compared to the GeForce cards - and unfortunately surfacing precisely where it scores points in older games (i.e. 1600x1200). As the Dungeon Siege 2, Max Payne 1/2 and Unreal Tournament 2004 results show, with optimised drivers it's not so clear-cut, but what is clear is that forceable per-driver texture compression can often lead to enormous gains, as the 32 MB VRAM per chip is frequently no longer sufficient at 32-bit or higher resolutions.

Furthermore, particular note should be taken of the performance of the 3dfx SLI, as in optimal conditions this even exceeds the fabled 100% mark!! The Voodoo5 5500's twin processors perform almost exactly twice as fast as a Voodoo4 4500 at the same clock speed (166/166 MHz) provided there are no CPU/interface limitations to overcome. On that basis, with the Voodoo5 6000 we are looking at a theoretical factor of 2.2. If we take a closer look at the benchmarks, however, we can, for example, see a performance increase of 150% in Max Payne 2 using 4xAA. We aren't sure ourselves quite how to explain this astonishing behaviour, but our assumption is that this is down to the VRAM "Pool", which grows in proportion to the GPU count.

As we have already explained in the section on the subject, 3dfx's VSA range was able to benefit from a string of advantages when it comes to picture quality. The rotated grid meant that qualitatively, when it came to antialiasing even a Voodoo4 4500 gives the GeForce2 Ultra some competition (albeit not in terms of speed, of course). What the Voodoo5 5500 and particularly the Voodoo5 6000 achieve in terms of AA quality, however, is still top notch even by today's standards, and certainly unattainable using other contemporary cards.

When you hook that up with the nice 16/22-bit visuals - which are only outclassed in some places by the Kyro's 32-bit internals - we have two clear marks in the card's favour, although the texture filtering doesn't quite make the same grade: the compbination of top quality supersampling and LOD shifting does give quality results, but a GeForce2 with both 2x2 OGSSAA and 2x anisotropic filtering enabled results in marginally better texture quality, which the Voodoo5 6000 only manages to match at 8x SGSSAA with a -2 LOD.

When we take this as a whole, it becomes apparent that the GeForce2 Ultra represents a "more rounded" option overall. It is very fast even with weak CPUs, it works in any motherboard, performs adequately at 1600x1200 and has some AA/AF into the bargain, so there are no real show-stoppers. By comparison, the Voodoo5 6000 would have been the fastest card by some distance for high resolutions, antialiasing and the much-demanded 32-bit support come the end of 2000, but when it comes to more recent software this sheer brute performance goes walkabout, as even with the work of devoted fans the drivers fall some way short. If 3dfx were still around, then the situation would presumably be much improved due to optimisations.

The company has now been dead for longer than it was alive, but it still has a loyal global following that continues to wave the 3dfx flag, all of whom revere this graphics card, which has demonstrated its truly remarkable performance throughout this article. We believe - and we think you will agree - that after reading this article, even the most fervent critic has to admit that this graphics "goddess" is extremely impressive in its own particular way.

May we take this opportunity to bid a fond farewell to one of our two test cards, which died on Sunday, 7 January 2007 whilst during a re-install. Our thoughts are with her master, who also restored the other card to working condition.

Credits
Our thanks to forum member Juerg for providing a macro that made creating the benchmark diagrams much, much easier.

Raff

2007-09-04, 10:16:28

Excellent! =)
You did a great job ... and belong to the credits now. :D Now we can start to build the translated arcticle!

Greetings,
Raff

Gast

2007-11-20, 01:29:23

Excellent! =)
You did a great job ... and belong to the credits now. :D Now we can start to build the translated arcticle!

Greetings,
Raff

When will we see it on the homepage?

Raff

2007-11-20, 20:33:23

Maybe in 2011. ;) I don't see any huge development concerning our CMS ...

Greetings,
Raff

Gast

2007-11-26, 18:00:26

Maybe in 2011. ;) I don't see any huge development concerning our CMS ...

Greetings,
Raff

wtf..people translate dotzen of pages..and then nobody wants them??

Lolman

2007-11-30, 10:15:20

Everybody wants them. But there are also many other Reviews in the pipeline and the whole Crew is waiting to make them public.

MadManniMan

2008-08-26, 22:02:11

Sä CMS is räddi - änni tschänz tu ßi se tränslehtet ardiggl onn se mehnpehdsch?

Raff

2008-08-26, 23:46:19

Das wird dieses Jahr bestimmt noch was. Spätestens zum Jahrestag. :D

MfG,
Raff