Gast
2017-06-21, 11:04:26
Vielleicht kann man die nachfolgenden Infos hier im Forum irgendwo posten. Ich denke die Infos sind nicht uninteressant. Die Seite, auf der cmaier gepostet hat, war schonmal offline.
I've designed both PowerPC processors for Exponential Technology and x86/x86-64 processors for AMD, so I'd be happy to answer any questions people have, preferably in English.
no problem. That quote was copied (not by me) from something I wrote on macrumors, and I was directed by somebody else (not the person who seems to have quoted me) to this site. I have a Ph.D. in electrical engineering (my dissertation involved a ultra-reduced instruction set computer) and worked at Exponential Technology (a startup that Apple invested in in the 1990's) as one of two logic designers on the x704 chip (a powerPC). I then worked at Sun briefly on a Sparc processor before I moved to AMD where I worked for 10 years on x86 chips, including K6 and the initial K8 (where I was in charge of the integer execution and scheduling blocks, and helped define AMD64 extensions to integer ops, among other things). I retired from the microprocessor design business a couple of years ago and became an attorney. I was directed here because someone told me there was some controversy about the CISC penalty, and since I've worked on three different RISC architectures as well as x86 (the only CISC architecture that matters) I know quite a bit about this topic. If you want to check if I'm for real, I have been published in technical journals, for example: Maier, Cliff A. et al. (1997). "A 533-MHz BiCMOS Superscalar RISC Microprocessor". IEEE Journal of Solid-State Circuits, Volume 32, Number 11, pp. 1625–1634, as well as: Atul Garg, Y. L. Le Coz, Hans J. Greub, R. B. Iverson, Robert F. Philhower, Pete M. Campbell, Cliff A. Maier, Sam A. Steidl, Matthew W. Ernest, Russell P. Kraft, Steven R. Carlough, J. W. Perry, Thomas W. Krawczyk Jr., John F. McDonald: Accurate high-speed performance prediction for full differential current-mode logic: the effect of dielectric anisotropy. IEEE Trans. on CAD of Integrated Circuits and Systems 18(2): 212-219 (1999), and Pete M. Campbell, Hans J. Greub, Atul Garg, A. Steidl, Steven R. Carlough, Matthew W. Ernest, Robert F. Philhower, Cliff A. Maier, Russell P. Kraft, John F. McDonald: A very wide bandwidth digital VCO using quadrature frequency multiplication and division implemented in AlGaAs/GaAs HBT's. IEEE Trans. VLSI Syst. 6(1): 52-55 (1998) among others.
Okay, some facts. What is the x86 decode penalty? On K8 (Athlon 64, Opteron) we had a DE block for decode. We also had IF (instruction fetch), EX1 (superscalar issue and retire), EX2 (integer execute), FP (floating point), LS (load/store), IO (i/o's), L1 (cache), L2 (cache), etc. Each team had around the same number of people in it except for the caches which had more, and the register file block which had more. (I was in charge of EX1/EX2 and the register file, so I guess I count for two or three). Overall, about 1 in 15 people worked on x86 decode issues.
In terms of die area, decode was about the same size as EX1 + EX2. It was a fairly small sliver of the die. Maybe 2%. In terms of transistors, it was also around 1-2%. We had around 30-50M non-cache transistors (depending on whether you count buffers, and depending on which transistors you count as part of the core). I think someone threw around some numbers for how many transistors are required for x86 decode, and those numbers match my experience. It's around 1-2%.
Finally, keep in mind that PowerPC and other RISC processors do not decode for free. It's not true anymore that the instruction bits set all the gates and do all the decoding for you. Both PowerPC and SPARC instruction sets do have to go through a decode stage. So the penalty I mentioned above should have subtracted off the decode cost on whatever your alternative architecture is.
In short, the x86 penalty is pretty small in terms of design time/hardware cost/die area. The benefit of x86 is that compilers are heavily optimized for x86, and x86, while it contains a lot of garbage, also contains many features that are specifically tuned to modern operating systems and to the behavior of compilers. As a result, more of the transistors on an x86 are doing useful work in each cycle than in a RISC processor.
RISC, by leaving these things out, can result in smaller, higher-clock speed processors. But power = CV^2f, so higher clock speed is no longer considered as good as higher instructions-per-cycle. Most of the x86 decode work is dedicated to setting up the superscalar issue hardware with hints to maximize instructions-per-cycle.
Finally, there is no way A4 is PWRficient. I can't explain why I know that, but if you pay $99 to Apple, they will tell you ISA A4 is.
Re: some of your questions. I spent the 2000's working on x86, so I was aware of PowerPC improvements only through reading the papers, and through following their engineering to make sure we were competitive. I don't know of anyone who believes PowerPC macs are faster than current Intel Macs. I think that is nostalgia on your part (or perhaps comparing old software on old machines to new software on new machines). Certainly doesn't match my experience. Battery time may or may not be less (my Intel MBP gets 7 hours on average. Seems pretty good to me).
As for ARM benefitting, that's mainly because people are willing to throw out tons of performance for much better battery life (a sensible trade-off for mobiles). But if you tried to scale an ARM up to compete with an i7, for example, you'd end up burning just as much, if not more, power, because you'd have to run at too high a clock frequency to make up for the lack of instruction parallelism.
What I'm saying is that there's nothing you can do with a RISC that you can't do with an x86 - it's just that RISC products, unable to compete with x86 for the meat of the market, are engineered for niches (low power portable and game boxes, super high speed workstations with exotic cooling requirements, etc.) No one, even Intel, has really started with a clean sheet of paper and seen what could be done in the handheld space with an x86, but it's likely they could get pretty close to ARM's performance/watt, though it would be a much more expensive die since it would have larger die area (it always costs more die area to increase instructions per cycle instead of frequency).
Quelle mit weiteren Beiträgen von cmaier: Quelle: http://www.ppcnux.de/?q=tim-cook-und-das-iphon-sdk-zum-a4
Die weiteren Beiträge kann ich auch gerne quoten, sofern der Post hier nicht verworfen wird.
I've designed both PowerPC processors for Exponential Technology and x86/x86-64 processors for AMD, so I'd be happy to answer any questions people have, preferably in English.
no problem. That quote was copied (not by me) from something I wrote on macrumors, and I was directed by somebody else (not the person who seems to have quoted me) to this site. I have a Ph.D. in electrical engineering (my dissertation involved a ultra-reduced instruction set computer) and worked at Exponential Technology (a startup that Apple invested in in the 1990's) as one of two logic designers on the x704 chip (a powerPC). I then worked at Sun briefly on a Sparc processor before I moved to AMD where I worked for 10 years on x86 chips, including K6 and the initial K8 (where I was in charge of the integer execution and scheduling blocks, and helped define AMD64 extensions to integer ops, among other things). I retired from the microprocessor design business a couple of years ago and became an attorney. I was directed here because someone told me there was some controversy about the CISC penalty, and since I've worked on three different RISC architectures as well as x86 (the only CISC architecture that matters) I know quite a bit about this topic. If you want to check if I'm for real, I have been published in technical journals, for example: Maier, Cliff A. et al. (1997). "A 533-MHz BiCMOS Superscalar RISC Microprocessor". IEEE Journal of Solid-State Circuits, Volume 32, Number 11, pp. 1625–1634, as well as: Atul Garg, Y. L. Le Coz, Hans J. Greub, R. B. Iverson, Robert F. Philhower, Pete M. Campbell, Cliff A. Maier, Sam A. Steidl, Matthew W. Ernest, Russell P. Kraft, Steven R. Carlough, J. W. Perry, Thomas W. Krawczyk Jr., John F. McDonald: Accurate high-speed performance prediction for full differential current-mode logic: the effect of dielectric anisotropy. IEEE Trans. on CAD of Integrated Circuits and Systems 18(2): 212-219 (1999), and Pete M. Campbell, Hans J. Greub, Atul Garg, A. Steidl, Steven R. Carlough, Matthew W. Ernest, Robert F. Philhower, Cliff A. Maier, Russell P. Kraft, John F. McDonald: A very wide bandwidth digital VCO using quadrature frequency multiplication and division implemented in AlGaAs/GaAs HBT's. IEEE Trans. VLSI Syst. 6(1): 52-55 (1998) among others.
Okay, some facts. What is the x86 decode penalty? On K8 (Athlon 64, Opteron) we had a DE block for decode. We also had IF (instruction fetch), EX1 (superscalar issue and retire), EX2 (integer execute), FP (floating point), LS (load/store), IO (i/o's), L1 (cache), L2 (cache), etc. Each team had around the same number of people in it except for the caches which had more, and the register file block which had more. (I was in charge of EX1/EX2 and the register file, so I guess I count for two or three). Overall, about 1 in 15 people worked on x86 decode issues.
In terms of die area, decode was about the same size as EX1 + EX2. It was a fairly small sliver of the die. Maybe 2%. In terms of transistors, it was also around 1-2%. We had around 30-50M non-cache transistors (depending on whether you count buffers, and depending on which transistors you count as part of the core). I think someone threw around some numbers for how many transistors are required for x86 decode, and those numbers match my experience. It's around 1-2%.
Finally, keep in mind that PowerPC and other RISC processors do not decode for free. It's not true anymore that the instruction bits set all the gates and do all the decoding for you. Both PowerPC and SPARC instruction sets do have to go through a decode stage. So the penalty I mentioned above should have subtracted off the decode cost on whatever your alternative architecture is.
In short, the x86 penalty is pretty small in terms of design time/hardware cost/die area. The benefit of x86 is that compilers are heavily optimized for x86, and x86, while it contains a lot of garbage, also contains many features that are specifically tuned to modern operating systems and to the behavior of compilers. As a result, more of the transistors on an x86 are doing useful work in each cycle than in a RISC processor.
RISC, by leaving these things out, can result in smaller, higher-clock speed processors. But power = CV^2f, so higher clock speed is no longer considered as good as higher instructions-per-cycle. Most of the x86 decode work is dedicated to setting up the superscalar issue hardware with hints to maximize instructions-per-cycle.
Finally, there is no way A4 is PWRficient. I can't explain why I know that, but if you pay $99 to Apple, they will tell you ISA A4 is.
Re: some of your questions. I spent the 2000's working on x86, so I was aware of PowerPC improvements only through reading the papers, and through following their engineering to make sure we were competitive. I don't know of anyone who believes PowerPC macs are faster than current Intel Macs. I think that is nostalgia on your part (or perhaps comparing old software on old machines to new software on new machines). Certainly doesn't match my experience. Battery time may or may not be less (my Intel MBP gets 7 hours on average. Seems pretty good to me).
As for ARM benefitting, that's mainly because people are willing to throw out tons of performance for much better battery life (a sensible trade-off for mobiles). But if you tried to scale an ARM up to compete with an i7, for example, you'd end up burning just as much, if not more, power, because you'd have to run at too high a clock frequency to make up for the lack of instruction parallelism.
What I'm saying is that there's nothing you can do with a RISC that you can't do with an x86 - it's just that RISC products, unable to compete with x86 for the meat of the market, are engineered for niches (low power portable and game boxes, super high speed workstations with exotic cooling requirements, etc.) No one, even Intel, has really started with a clean sheet of paper and seen what could be done in the handheld space with an x86, but it's likely they could get pretty close to ARM's performance/watt, though it would be a much more expensive die since it would have larger die area (it always costs more die area to increase instructions per cycle instead of frequency).
Quelle mit weiteren Beiträgen von cmaier: Quelle: http://www.ppcnux.de/?q=tim-cook-und-das-iphon-sdk-zum-a4
Die weiteren Beiträge kann ich auch gerne quoten, sofern der Post hier nicht verworfen wird.