Leonidas
2002-03-30, 19:31:43
AMD´s Gigahertz Equivalency: Inexperienced Buyers Accept Bad Science
http://www.aberdeen.com/ab%5Fcompany/hottopics/amd/default.htm
AberdeenGroup
AMD’s Gigahertz
Equivalency:
Inexperienced Buyers
Accept Bad Science
An Executive White Paper
March 2002
Aberdeen Group, Inc.
One Boston Place
Boston, Massachusetts 02108 USA
Telephone: 617 723 7890
Fax: 617 723 7897
www.aberdeen.com
Executive Summary
Millions of inexperienced PC (personal computer) buyers are confronted with a
fire hose of sophisticated technology speak and feel intimidated even though they
want to make informed buying decisions, including those regarding PC performance.
These buyers around the world seek the holy grail of a single performance
metric for choosing PC processors, thinking they will be able to select a microprocessor
based on, for example, clock speed as measured in gigahertz (GHz)
just as they select the next family car’s motor by horsepower. But, whereas auto
horsepower is measured by automobile-industry standards, there is no single
accepted performance metric that allows buyers to compare microprocessor
performance across a wide spectrum of PC usage.
Comparisons between processors certainly cannot be made between different
processor architectures based on clock speed for the simple fact that different
processor architectures do different amounts of work in each clock tick — not to
mention application dependencies. Nevertheless, Advanced Micro Devices (AMD)
last year deliberately took a step down a slippery slope of bad science when it
named its Athlon XP line of microprocessor models with clock-speed gigahertz ratings
equivalent to Intel’s competing Pentium 4 (P4), based on a set of application
benchmarks audited by Arthur Andersen and fully described in AMD vs. Intel comparisons
at AMD’s Web site. Though AMD can say the new model numbers refer
only to its own microprocessors, clearly the competitive comparisons are to Intel’s
microprocessors. AMD will undoubtedly pay a marketing price in 2002 for the bad
science that has confused the market and many PC buyers, as it must soon retreat
from the gigahertz equivalency positioning and take another performance rating
approach.
What’s the flaw in AMD’s equivalency ratings? There are many discussed in this
Aberdeen Executive White Paper. The key flaw is that the equivalency rating is a
snapshot in a moment in time — and time surely marches on in the computer industry
— making the gigahertz equivalency subject to increasing variance over time.
For example, the AMD Athlon XP 2000+ processor announced last fall runs at 1.667
GHz. The 2000+ equivalency rating is aimed at Intel’s P4 2.0 GHz Willamette processor.
In less than six months, the 2000+ equivalency rating is no longer factual
and scientific for the following reasons: numerous performance improvements
have been released by application software suppliers; the 2001 edition bench-
marks used by AMD now have 2002 versions; and Intel introduced the P4 2.0A GHz
Northwood processor, which uses more cache and a new manufacturing process to
deliver significantly greater benchmark performance at the same 2.0 GHz clock rate.
The comparisons will only get worse over time as benchmarks, the operating
systems, and applications evolve. Comparing the Athlon XP 2000+ model to the
P4 2.0A Northwood with benchmarks against the original “equivalent processor”
Willamette is not justifiable in the benchmark science, and is confusing the market.
Moreover, AMD’s processor equivalency methodology is flawed at the core: It
assumes a specific application usage model that does not apply to many users; it is
platform-specific, ignoring critical differences such as memory type; and the methodology
uses system-level benchmarks including Input/Output (I/O), an approach not
used to measure processors alone. There is a distinct “Pinocchio factor” that will
only grow over time as pseudo-equivalency gradually becomes patently inaccurate.
Last October, AMD said it would seek a new means of rating performance through
a True Performance Initiative (TPI). Aberdeen applauds this step because AMD will
be well served to have this gigahertz equivalency fiasco behind it. The fact is that
AMD processors are quite efficient for lots of applications and do not need the
steroids of hokey equivalency to deserve market respect.
Processor Performance Should Not Be Measured in Gigahertz
Gigahertz alone is a meaningless indicator of performance. It only indicates how
often the processor clock ticks, causing computing to be done, measured in billions
of ticks per second. Gigahertz tells nothing about how much work gets done
in each clock tick. What buyers are really seeking is some measure of throughput
— i.e., how much useful application work is accomplished per unit of time.
Choosing a Usage Model
A usage model is the result of a workload characterization. The model makes assumptions
about the application mix that PC users will be running on their machines.
In the real world, the actual mix is very personal and depends on what
applications are used, for how long, and in what context — work, mobile, or at
home. Because personalized usage models are impractical, PC processor manufacturers
and the trade press typically report results on a variety of industry benchmarks
across several usage categories, including the following:
• Office productivity — Microsoft Office applications, anti-virus, and e-mail;
• Internet — Web browsing, Macromedia Flash, Adobe Acrobat, Microsoft
or Real media player;
• Content creation — paper and Internet content-creation applications for
either graphics workers or hobbyists, including applications such as
Adobe Premier and Photoshop, Macromedia Dreamweaver, and Sonic
Foundry’s Sound Forge; and
• Entertainment/3D gaming — 3D Winbench, 3DMark, Quake, and others
measured in frames per second.
Both content creation and 3D gaming are extremely processor- and memory-access
intensive; the office productivity and Internet application benchmarks stress the entire
system, including I/O, even though the processor may not be running at capacity.
In addition, the Standard Performance Evaluation Corporation (SPEC)1 has a
widely accepted processor benchmark called SPEC CPU2000 that is designed “to
provide a comparative measure of compute-intensive performance across the widest
practical range of hardware.” All of the major microprocessor manufacturers
are members of, and report results to, SPEC.
AMD’s Processor Benchmark and Model Number Methodology
AMD’s workload characterization white paper2 assumes a usage model where each
of three components is measured equally (i.e., in thirds):
1. Office Productivity;
2. Content Creation; and
3. 3D Gaming.
AMD selected a variety of PC applications including synthetic benchmarks to represent
the workloads in these three categories. These tests were then run on P4
machines running at 1.5 GHz, 1.6 GHz, 1.7 GHz, and 1.8 GHz and Athlon XP machines
running at 1.33 GHz (model 1500+), 1.4 GHz (1600+), 1.47 GHz (1700+),
and 1.53 GHz (1800+). The results were then normalized, using the P4 at 1.5
GHz as the “100%” normalization point. The benchmark tests were audited by
Arthur Andersen using the attestation standards of the American Institute of Certified
Public Accountants (AICPA).
Straight faced, Aberdeen reports that AMD’s benchmark white paper concludes
that AMD processors outperform Intel’s Pentium 4, and therefore deserve the gigahertz
equivalency model numbers assigned by AMD. However, Aberdeen is not
persuaded by the way the benchmark was conducted, and we do not accept AMD’s
claims for the usefulness to buyers of the gigahertz equivalency ratings.
Flaws in AMD’s Benchmark Methodology
Aberdeen did not expect to find flaws in how AMD conducted the benchmarks that
led to the gigahertz equivalency model numbers now in use, especially with an attestation
letter from public accountants Arthur Andersen. Our complaints are not
trivial: They strike at the vary nature of benchmark fairness. After all, the stated
purpose of AMD’s gigahertz equivalency benchmarks is to provide a scientific basis
for what would otherwise be an exercise in mere “benchmarketing” — i.e., the use
of benchmarks to advance marketing purposes. Aberdeen’s examination focused
on comparisons of the benchmarks with publicly reported results by both Intel
and AMD — BAPCo’s SYSmark 2001 and the Quake III game.
BAPCo’s SYSmark 2001 rules state that any time SYSmark results are used in public,
they must be reported to BAPCo.3 There are no results reported to BAPCo
as of February 15, 2002, for the Athlon XP 2000+.4 SYSmark 2001 is particularly
important because it counts for slightly more than one-third of the entire performance
score used to derive gigahertz equivalency in AMD processors.5 Arthur
Andersen’s audit attestation is specifically about the 2000+.
Microsoft Media Player Enhancement
AMD and Microsoft enhanced Media Player, a component of SYSmark 2001, allows
Media Player to take advantage of inherent Athlon XP instructions. This enhancement
will be available in a Windows XP software update. AMD included results
both with and without the enhancement in its content creation results — a situation
no real user would have, because a user would either have the enhancement
or not, but not both conditions.
Inconsistent Results
There are numerous small discrepancies between the SYSmark 2001 results reported
in AMD’s white paper and on its benchmark audit site and those reported
on BAPCo’s site. While the differences between reported results are small, Aberdeen’s
experience suggests that these are indications of repeatability problems,
audit problems, or the jumping of the publication gun. The discrepancies detract
from and draw attention to what should be consistent and above board.
Obsolete Benchmarks
Winbench 20007 is a component of the entertainment suite. It measures processor-
intensive graphics performance using Microsoft’s DirectX 7.0 3D software. But
all the AMD and Intel systems were tested with Windows XP, which comes with
DirectX 8.1. It is unlikely that any real user, particularly a gamer, would load an
older version of an operating system component. DirectX 8.1 is tested in AMD’s
methodology using Madonion’s 3DMark 2001.
Intel Beats AMD on Intel
Intel scores at BAPCo beat the results reported by AMD for the same processor on
SYSmark 2001. The same is true with Quake III Arena Demo Arena II.9 It should
not be surprising that Intel knows how to tune its systems better than AMD. But,
the lower results achieved and reported by AMD do, in turn, improve AMD’s gigahertz
equivalency to Intel because lower Intel results improve AMD by comparison.
Flaws in the Concept of Gigahertz Equivalency Methodology
Even if the benchmarks used to derive gigahertz equivalency were pristinely conducted,
Aberdeen Group would still disagree with the concept that a single gigahertz
equivalency expressed in a model number is applicable in the real world for
more than a few days after the benchmark exercise is completed. The concept of
gigahertz equivalency for setting a processor rating has the following traits:
• It implies a fixed relationship over time, something that seldom happens
in the fast-changing computer industry;
• It implies a strict workload usage model that may not be applicable;
• It mixes system performance on I/O with processor-intensive work — a
poor way to judge the processor alone;
• It must be consistently applied; and
• It is totally platform-dependent.
Time Waits for No Computer
Tried to sell or give away that old 486 machine lately? Then you know that nobody
wants it; it’s as obsolete as a buggy whip. What happened — it still runs fine,
right? What happened is that the computer industry advanced. And, as a result,
what was cutting-edge performance costing thousands of dollars is now a doorstop.
The same applies to snapshot-in-time gigahertz ratings. Since AMD wrote its
Benchmark and Model Numbering Methodology white paper in early October, the
following has happened:
• Microsoft Windows XP was launched in mid-October 2001. Since then,
numerous application patches have been issued to make these applications
more compatible with and perform better with Windows XP, including
applications such as Office 2000 that are included in AMD’s benchmark
suite. But AMD’s methodology only uses the original Windows XP
CD-ROM, allowing for no changes over time. Not realistic.
• One of those improved applications is Windows Media Player 7.0. AMD
and Microsoft made an enhancement. Performance will improve; that is
the industry norm. Note that this particular change, which favors AMD,
was incorporated in the results. But the enhancement was made and
tested after Arthur Andersen did its audit, so the improved numbers
used to justify AMD’s processor gigahertz equivalents stand out. That
they changed in less than two months is the point: There will always be
change. The same AMD processor will do more work in time because
PC application software will be more tailored to the processor. The
solution is certainly not to issue little stickers to put on the processor’s
heat sink that say an 1800+ is now an 1856+.
• Some application improvements will benefit AMD alone, such as the previously
mentioned Media Player enhancement. Other application changes
will undoubtedly take advantage of Intel’s unique P4 micro-architecture
and instructions to the relative detriment of AMD’s processor results.
Thus, equivalency will be highly application-dependent at best.
• The benchmarks underlying the AMD methodology are changing over
time. Ziff Davis’ Winstone 2001 has been replaced with Winstone
2002,10 acknowledging improvements and changes in the underlying
applications themselves. It would not be realistic for AMD to freeze the
benchmark suite, but if AMD does not, then the entire processor equivalency
metric changes over time as the benchmarks change — even while
the clock rate stays the same.
These uncertainties and unrealities — which deepen over time — drive Aberdeen’s
deep distrust of the underlying methodology used by AMD to derive processor
gigahertz-equivalent ratings.
Workload Characterization: Nine to Five, or Nine in the Morning to Nine at Night?
AMD’s methodology presumes a workload that is one-third office, one-third digital
content creation, and one-third entertainment/gaming. Assuming Internet usage is
measured in that mix, Aberdeen suggests that AMD has the categories correct. It is
the usage percentages that make the difference: Should gaming represent 33%, or
15%, or none? No one should play God with computer users’ lives, but that means
leaving the workload mix ratios up to the buyer, which implies not having a single
processor performance metric on which to base processor equivalency.
Consistency Counts
AMD’s new mobile Athlon 4 processor was announced on January 28 with a megahertz-
equivalency model number of 1500+. However, AMD’s methodology for
megahertz-equivalency rating of laptop processors uses different benchmarks on a
different operating system than for AMD desktop processors, and no methodology
has been publicly reported for the 1500+ rating.11 Consistency is essential to rating
processor performance. The most important reason is that Aberdeen research
shows that more than half the laptops are bought as desktop replacements. Therefore,
Aberdeen suggests performance should be measured the same on the road as
at the home or the office.
AMD’s earlier mobile megahertz-equivalency is relative to Intel’s Pentium III, while
desktop megahertz-equivalency is relative to P4. That makes it impossible to compare
AMD’s own chips against each other based on megahertz-equivalency-based
model numbers. Moreover, the AMD mobile megahertz-equivalency position
changes over the next 90 days as Intel complements the Pentium III Mobile with a
Pentium 4 Mobile. Aberdeen concludes that AMD desktop and mobile methodologies
are not consistent.
System Input/Output Is Not Processor Performance
System platforms are changing and thus are a poor way to measure a specific
processor. For example, next year’s 10,000 RPM mainstream disk drives will drive
considerably higher performance in office productivity applications with lesser
improvements in content creation and entertainment. Should a faster disk alone
lead to a faster gigahertz equivalency rating? What about already rated processors
that can also benefit from faster I/O? What about the fact that the processor is idle
— zero gigahertz of work done — while the disk is being accessed; how should
that affect processor ratings? AMD has chosen to include I/O-intensive benchmarks
such as Business Winstone 2001 even though these are system-level benchmarks
designed to match Dell PCs versus Compaq, not AMD versus Intel processors.
In January, Intel delivered a new architecture variant to the P4, code-named
Northwood. It has the same 2.0 GHz as before — and faster models, too — but
performs better due to architectural improvements such as a larger cache and a
new 130 nanometer manufacturing process. All subsequent P4s will build off this
new technology base. That means the Athlon 2000+ is matched against an obsolescent
processor making for bogus comparisons with the new 2.0A GHz Intel P4
processor.
Moreover, differences in chip sets, memory architecture and speed, device driver
maturity, and countless other differences are unique to a specific platform — not
to a specific processor. Many components in a platform will change during the
production life of a processor chip, but the chip’s inherent clock rate will not.
Aberdeen Conclusions
Microprocessor performance measurement is a complex subject even for engineers,
so consumers seeking a single metric tailored to their own specific usage workloads
need to be told that it is impractical if not impossible. The closest the computer
industry has come to an accepted processor benchmark is SPEC CPU2000, but that
benchmark is not directly tied to specific PC applications. Nevertheless, before
AMD goes too far down the True Performance Initiative road, Aberdeen suggests a
hard look at SPEC CPU2000 is in order. After all, AMD is a member of SPEC.
Aberdeen has concerns regarding the inconsistencies and flaws in AMD’s benchmarks
that measure Intel and AMD systems. But there is no smoking gun, even
though Arthur Andersen ought to explain the discrepancies between its audited
results and what is reported at BAPCo. Aberdeen does not think Andersen’s examination
of these benchmarks will enhance consumer confidence in industry
benchmark results.
Aberdeen is in serious disagreement with AMD over the basic methodology used to
determine processor gigahertz equivalency, believing it is fundamentally flawed in
measuring processor apples and I/O oranges, using an arguable workload mix that
will not stand the test of time. The speed of light — and electrons in processors
— today ought to be the same speed tomorrow. AMD’s methodology will not
support that without becoming quaintly obsolete in another 90 days as the methodology
breaks down completely over time. Bad science in the name of customer
advocacy is not the approach to market that Aberdeen recommends. Aberdeen is
not alone in questioning gigahertz equivalency. Close reading of the PC trade
press and PC performance Web sites shows an increasing skepticism, and it would
not be surprising to see the press turn surly.
The AMD True Performance Initiative program, announced in October as truly
strategic,12 has sunk beneath a wave of performance-equivalency press releases.
Progress needs to be announced soon about TPI or else the initiative will appear
to be a marketing ploy.
We have seen advertisements in Europe and on the Web that take gigahertz ratings
from sublime to ridiculous: naively reporting gigahertz-equivalent AMD model
numbers as actual gigahertz.13 Aberdeen questions whether AMD is feinting one
direction by arguing that gigahertz ratings are meaningless, while hoping the
market will “up-clock” its products. Meanwhile, buyers remain confused.
Clearly our research backs AMD’s claims that gigahertz alone are a poor measure
of processor performance. But the gigahertz equivalency methodology baked into
the model number, as practiced by AMD, is yet another meaningless indicator of
performance. Aberdeen Group doubts that this year will pass without a new performance
measurement methodology from AMD, requiring a wholesale readjustment
in model numbers — past, present, and future.
http://www.aberdeen.com/ab%5Fcompany/hottopics/amd/default.htm
AberdeenGroup
AMD’s Gigahertz
Equivalency:
Inexperienced Buyers
Accept Bad Science
An Executive White Paper
March 2002
Aberdeen Group, Inc.
One Boston Place
Boston, Massachusetts 02108 USA
Telephone: 617 723 7890
Fax: 617 723 7897
www.aberdeen.com
Executive Summary
Millions of inexperienced PC (personal computer) buyers are confronted with a
fire hose of sophisticated technology speak and feel intimidated even though they
want to make informed buying decisions, including those regarding PC performance.
These buyers around the world seek the holy grail of a single performance
metric for choosing PC processors, thinking they will be able to select a microprocessor
based on, for example, clock speed as measured in gigahertz (GHz)
just as they select the next family car’s motor by horsepower. But, whereas auto
horsepower is measured by automobile-industry standards, there is no single
accepted performance metric that allows buyers to compare microprocessor
performance across a wide spectrum of PC usage.
Comparisons between processors certainly cannot be made between different
processor architectures based on clock speed for the simple fact that different
processor architectures do different amounts of work in each clock tick — not to
mention application dependencies. Nevertheless, Advanced Micro Devices (AMD)
last year deliberately took a step down a slippery slope of bad science when it
named its Athlon XP line of microprocessor models with clock-speed gigahertz ratings
equivalent to Intel’s competing Pentium 4 (P4), based on a set of application
benchmarks audited by Arthur Andersen and fully described in AMD vs. Intel comparisons
at AMD’s Web site. Though AMD can say the new model numbers refer
only to its own microprocessors, clearly the competitive comparisons are to Intel’s
microprocessors. AMD will undoubtedly pay a marketing price in 2002 for the bad
science that has confused the market and many PC buyers, as it must soon retreat
from the gigahertz equivalency positioning and take another performance rating
approach.
What’s the flaw in AMD’s equivalency ratings? There are many discussed in this
Aberdeen Executive White Paper. The key flaw is that the equivalency rating is a
snapshot in a moment in time — and time surely marches on in the computer industry
— making the gigahertz equivalency subject to increasing variance over time.
For example, the AMD Athlon XP 2000+ processor announced last fall runs at 1.667
GHz. The 2000+ equivalency rating is aimed at Intel’s P4 2.0 GHz Willamette processor.
In less than six months, the 2000+ equivalency rating is no longer factual
and scientific for the following reasons: numerous performance improvements
have been released by application software suppliers; the 2001 edition bench-
marks used by AMD now have 2002 versions; and Intel introduced the P4 2.0A GHz
Northwood processor, which uses more cache and a new manufacturing process to
deliver significantly greater benchmark performance at the same 2.0 GHz clock rate.
The comparisons will only get worse over time as benchmarks, the operating
systems, and applications evolve. Comparing the Athlon XP 2000+ model to the
P4 2.0A Northwood with benchmarks against the original “equivalent processor”
Willamette is not justifiable in the benchmark science, and is confusing the market.
Moreover, AMD’s processor equivalency methodology is flawed at the core: It
assumes a specific application usage model that does not apply to many users; it is
platform-specific, ignoring critical differences such as memory type; and the methodology
uses system-level benchmarks including Input/Output (I/O), an approach not
used to measure processors alone. There is a distinct “Pinocchio factor” that will
only grow over time as pseudo-equivalency gradually becomes patently inaccurate.
Last October, AMD said it would seek a new means of rating performance through
a True Performance Initiative (TPI). Aberdeen applauds this step because AMD will
be well served to have this gigahertz equivalency fiasco behind it. The fact is that
AMD processors are quite efficient for lots of applications and do not need the
steroids of hokey equivalency to deserve market respect.
Processor Performance Should Not Be Measured in Gigahertz
Gigahertz alone is a meaningless indicator of performance. It only indicates how
often the processor clock ticks, causing computing to be done, measured in billions
of ticks per second. Gigahertz tells nothing about how much work gets done
in each clock tick. What buyers are really seeking is some measure of throughput
— i.e., how much useful application work is accomplished per unit of time.
Choosing a Usage Model
A usage model is the result of a workload characterization. The model makes assumptions
about the application mix that PC users will be running on their machines.
In the real world, the actual mix is very personal and depends on what
applications are used, for how long, and in what context — work, mobile, or at
home. Because personalized usage models are impractical, PC processor manufacturers
and the trade press typically report results on a variety of industry benchmarks
across several usage categories, including the following:
• Office productivity — Microsoft Office applications, anti-virus, and e-mail;
• Internet — Web browsing, Macromedia Flash, Adobe Acrobat, Microsoft
or Real media player;
• Content creation — paper and Internet content-creation applications for
either graphics workers or hobbyists, including applications such as
Adobe Premier and Photoshop, Macromedia Dreamweaver, and Sonic
Foundry’s Sound Forge; and
• Entertainment/3D gaming — 3D Winbench, 3DMark, Quake, and others
measured in frames per second.
Both content creation and 3D gaming are extremely processor- and memory-access
intensive; the office productivity and Internet application benchmarks stress the entire
system, including I/O, even though the processor may not be running at capacity.
In addition, the Standard Performance Evaluation Corporation (SPEC)1 has a
widely accepted processor benchmark called SPEC CPU2000 that is designed “to
provide a comparative measure of compute-intensive performance across the widest
practical range of hardware.” All of the major microprocessor manufacturers
are members of, and report results to, SPEC.
AMD’s Processor Benchmark and Model Number Methodology
AMD’s workload characterization white paper2 assumes a usage model where each
of three components is measured equally (i.e., in thirds):
1. Office Productivity;
2. Content Creation; and
3. 3D Gaming.
AMD selected a variety of PC applications including synthetic benchmarks to represent
the workloads in these three categories. These tests were then run on P4
machines running at 1.5 GHz, 1.6 GHz, 1.7 GHz, and 1.8 GHz and Athlon XP machines
running at 1.33 GHz (model 1500+), 1.4 GHz (1600+), 1.47 GHz (1700+),
and 1.53 GHz (1800+). The results were then normalized, using the P4 at 1.5
GHz as the “100%” normalization point. The benchmark tests were audited by
Arthur Andersen using the attestation standards of the American Institute of Certified
Public Accountants (AICPA).
Straight faced, Aberdeen reports that AMD’s benchmark white paper concludes
that AMD processors outperform Intel’s Pentium 4, and therefore deserve the gigahertz
equivalency model numbers assigned by AMD. However, Aberdeen is not
persuaded by the way the benchmark was conducted, and we do not accept AMD’s
claims for the usefulness to buyers of the gigahertz equivalency ratings.
Flaws in AMD’s Benchmark Methodology
Aberdeen did not expect to find flaws in how AMD conducted the benchmarks that
led to the gigahertz equivalency model numbers now in use, especially with an attestation
letter from public accountants Arthur Andersen. Our complaints are not
trivial: They strike at the vary nature of benchmark fairness. After all, the stated
purpose of AMD’s gigahertz equivalency benchmarks is to provide a scientific basis
for what would otherwise be an exercise in mere “benchmarketing” — i.e., the use
of benchmarks to advance marketing purposes. Aberdeen’s examination focused
on comparisons of the benchmarks with publicly reported results by both Intel
and AMD — BAPCo’s SYSmark 2001 and the Quake III game.
BAPCo’s SYSmark 2001 rules state that any time SYSmark results are used in public,
they must be reported to BAPCo.3 There are no results reported to BAPCo
as of February 15, 2002, for the Athlon XP 2000+.4 SYSmark 2001 is particularly
important because it counts for slightly more than one-third of the entire performance
score used to derive gigahertz equivalency in AMD processors.5 Arthur
Andersen’s audit attestation is specifically about the 2000+.
Microsoft Media Player Enhancement
AMD and Microsoft enhanced Media Player, a component of SYSmark 2001, allows
Media Player to take advantage of inherent Athlon XP instructions. This enhancement
will be available in a Windows XP software update. AMD included results
both with and without the enhancement in its content creation results — a situation
no real user would have, because a user would either have the enhancement
or not, but not both conditions.
Inconsistent Results
There are numerous small discrepancies between the SYSmark 2001 results reported
in AMD’s white paper and on its benchmark audit site and those reported
on BAPCo’s site. While the differences between reported results are small, Aberdeen’s
experience suggests that these are indications of repeatability problems,
audit problems, or the jumping of the publication gun. The discrepancies detract
from and draw attention to what should be consistent and above board.
Obsolete Benchmarks
Winbench 20007 is a component of the entertainment suite. It measures processor-
intensive graphics performance using Microsoft’s DirectX 7.0 3D software. But
all the AMD and Intel systems were tested with Windows XP, which comes with
DirectX 8.1. It is unlikely that any real user, particularly a gamer, would load an
older version of an operating system component. DirectX 8.1 is tested in AMD’s
methodology using Madonion’s 3DMark 2001.
Intel Beats AMD on Intel
Intel scores at BAPCo beat the results reported by AMD for the same processor on
SYSmark 2001. The same is true with Quake III Arena Demo Arena II.9 It should
not be surprising that Intel knows how to tune its systems better than AMD. But,
the lower results achieved and reported by AMD do, in turn, improve AMD’s gigahertz
equivalency to Intel because lower Intel results improve AMD by comparison.
Flaws in the Concept of Gigahertz Equivalency Methodology
Even if the benchmarks used to derive gigahertz equivalency were pristinely conducted,
Aberdeen Group would still disagree with the concept that a single gigahertz
equivalency expressed in a model number is applicable in the real world for
more than a few days after the benchmark exercise is completed. The concept of
gigahertz equivalency for setting a processor rating has the following traits:
• It implies a fixed relationship over time, something that seldom happens
in the fast-changing computer industry;
• It implies a strict workload usage model that may not be applicable;
• It mixes system performance on I/O with processor-intensive work — a
poor way to judge the processor alone;
• It must be consistently applied; and
• It is totally platform-dependent.
Time Waits for No Computer
Tried to sell or give away that old 486 machine lately? Then you know that nobody
wants it; it’s as obsolete as a buggy whip. What happened — it still runs fine,
right? What happened is that the computer industry advanced. And, as a result,
what was cutting-edge performance costing thousands of dollars is now a doorstop.
The same applies to snapshot-in-time gigahertz ratings. Since AMD wrote its
Benchmark and Model Numbering Methodology white paper in early October, the
following has happened:
• Microsoft Windows XP was launched in mid-October 2001. Since then,
numerous application patches have been issued to make these applications
more compatible with and perform better with Windows XP, including
applications such as Office 2000 that are included in AMD’s benchmark
suite. But AMD’s methodology only uses the original Windows XP
CD-ROM, allowing for no changes over time. Not realistic.
• One of those improved applications is Windows Media Player 7.0. AMD
and Microsoft made an enhancement. Performance will improve; that is
the industry norm. Note that this particular change, which favors AMD,
was incorporated in the results. But the enhancement was made and
tested after Arthur Andersen did its audit, so the improved numbers
used to justify AMD’s processor gigahertz equivalents stand out. That
they changed in less than two months is the point: There will always be
change. The same AMD processor will do more work in time because
PC application software will be more tailored to the processor. The
solution is certainly not to issue little stickers to put on the processor’s
heat sink that say an 1800+ is now an 1856+.
• Some application improvements will benefit AMD alone, such as the previously
mentioned Media Player enhancement. Other application changes
will undoubtedly take advantage of Intel’s unique P4 micro-architecture
and instructions to the relative detriment of AMD’s processor results.
Thus, equivalency will be highly application-dependent at best.
• The benchmarks underlying the AMD methodology are changing over
time. Ziff Davis’ Winstone 2001 has been replaced with Winstone
2002,10 acknowledging improvements and changes in the underlying
applications themselves. It would not be realistic for AMD to freeze the
benchmark suite, but if AMD does not, then the entire processor equivalency
metric changes over time as the benchmarks change — even while
the clock rate stays the same.
These uncertainties and unrealities — which deepen over time — drive Aberdeen’s
deep distrust of the underlying methodology used by AMD to derive processor
gigahertz-equivalent ratings.
Workload Characterization: Nine to Five, or Nine in the Morning to Nine at Night?
AMD’s methodology presumes a workload that is one-third office, one-third digital
content creation, and one-third entertainment/gaming. Assuming Internet usage is
measured in that mix, Aberdeen suggests that AMD has the categories correct. It is
the usage percentages that make the difference: Should gaming represent 33%, or
15%, or none? No one should play God with computer users’ lives, but that means
leaving the workload mix ratios up to the buyer, which implies not having a single
processor performance metric on which to base processor equivalency.
Consistency Counts
AMD’s new mobile Athlon 4 processor was announced on January 28 with a megahertz-
equivalency model number of 1500+. However, AMD’s methodology for
megahertz-equivalency rating of laptop processors uses different benchmarks on a
different operating system than for AMD desktop processors, and no methodology
has been publicly reported for the 1500+ rating.11 Consistency is essential to rating
processor performance. The most important reason is that Aberdeen research
shows that more than half the laptops are bought as desktop replacements. Therefore,
Aberdeen suggests performance should be measured the same on the road as
at the home or the office.
AMD’s earlier mobile megahertz-equivalency is relative to Intel’s Pentium III, while
desktop megahertz-equivalency is relative to P4. That makes it impossible to compare
AMD’s own chips against each other based on megahertz-equivalency-based
model numbers. Moreover, the AMD mobile megahertz-equivalency position
changes over the next 90 days as Intel complements the Pentium III Mobile with a
Pentium 4 Mobile. Aberdeen concludes that AMD desktop and mobile methodologies
are not consistent.
System Input/Output Is Not Processor Performance
System platforms are changing and thus are a poor way to measure a specific
processor. For example, next year’s 10,000 RPM mainstream disk drives will drive
considerably higher performance in office productivity applications with lesser
improvements in content creation and entertainment. Should a faster disk alone
lead to a faster gigahertz equivalency rating? What about already rated processors
that can also benefit from faster I/O? What about the fact that the processor is idle
— zero gigahertz of work done — while the disk is being accessed; how should
that affect processor ratings? AMD has chosen to include I/O-intensive benchmarks
such as Business Winstone 2001 even though these are system-level benchmarks
designed to match Dell PCs versus Compaq, not AMD versus Intel processors.
In January, Intel delivered a new architecture variant to the P4, code-named
Northwood. It has the same 2.0 GHz as before — and faster models, too — but
performs better due to architectural improvements such as a larger cache and a
new 130 nanometer manufacturing process. All subsequent P4s will build off this
new technology base. That means the Athlon 2000+ is matched against an obsolescent
processor making for bogus comparisons with the new 2.0A GHz Intel P4
processor.
Moreover, differences in chip sets, memory architecture and speed, device driver
maturity, and countless other differences are unique to a specific platform — not
to a specific processor. Many components in a platform will change during the
production life of a processor chip, but the chip’s inherent clock rate will not.
Aberdeen Conclusions
Microprocessor performance measurement is a complex subject even for engineers,
so consumers seeking a single metric tailored to their own specific usage workloads
need to be told that it is impractical if not impossible. The closest the computer
industry has come to an accepted processor benchmark is SPEC CPU2000, but that
benchmark is not directly tied to specific PC applications. Nevertheless, before
AMD goes too far down the True Performance Initiative road, Aberdeen suggests a
hard look at SPEC CPU2000 is in order. After all, AMD is a member of SPEC.
Aberdeen has concerns regarding the inconsistencies and flaws in AMD’s benchmarks
that measure Intel and AMD systems. But there is no smoking gun, even
though Arthur Andersen ought to explain the discrepancies between its audited
results and what is reported at BAPCo. Aberdeen does not think Andersen’s examination
of these benchmarks will enhance consumer confidence in industry
benchmark results.
Aberdeen is in serious disagreement with AMD over the basic methodology used to
determine processor gigahertz equivalency, believing it is fundamentally flawed in
measuring processor apples and I/O oranges, using an arguable workload mix that
will not stand the test of time. The speed of light — and electrons in processors
— today ought to be the same speed tomorrow. AMD’s methodology will not
support that without becoming quaintly obsolete in another 90 days as the methodology
breaks down completely over time. Bad science in the name of customer
advocacy is not the approach to market that Aberdeen recommends. Aberdeen is
not alone in questioning gigahertz equivalency. Close reading of the PC trade
press and PC performance Web sites shows an increasing skepticism, and it would
not be surprising to see the press turn surly.
The AMD True Performance Initiative program, announced in October as truly
strategic,12 has sunk beneath a wave of performance-equivalency press releases.
Progress needs to be announced soon about TPI or else the initiative will appear
to be a marketing ploy.
We have seen advertisements in Europe and on the Web that take gigahertz ratings
from sublime to ridiculous: naively reporting gigahertz-equivalent AMD model
numbers as actual gigahertz.13 Aberdeen questions whether AMD is feinting one
direction by arguing that gigahertz ratings are meaningless, while hoping the
market will “up-clock” its products. Meanwhile, buyers remain confused.
Clearly our research backs AMD’s claims that gigahertz alone are a poor measure
of processor performance. But the gigahertz equivalency methodology baked into
the model number, as practiced by AMD, is yet another meaningless indicator of
performance. Aberdeen Group doubts that this year will pass without a new performance
measurement methodology from AMD, requiring a wholesale readjustment
in model numbers — past, present, and future.