A 30% to 40% performance increase over Bulldozer is apparently possible

Aug 31, 2012 09:11 GMT  ·  By

Although the company is being very hush about any new desktop processors, the computing architects and engineers in its R&D department are hard at work developing the technology behind the architecture code-named Steamroller.

We were quite surprised to see that AMD has unveiled some of the key architectural improvements that are coming in its new Steamroller design during this year’s Hot Chips conference in Cupertino, California.

We knew AMD was planning to reveal the low-power Jaguar architecture at Hot Chips and we reported on that here and here, but Steamroller seemed pretty far away to be a subject for this year’s conference.

The company has not even released desktop processors with the new Piledriver design, so nobody was expecting to hear about Steamroller when desktop Piledriver is not even on sale.

We have a feeling that Steamroller might come a lot sooner than many expect, but we’ll gather up more info on that before issuing any hypotheses.

Right now, we know that AMD has really done some serious work on the insides of its third-generation Bulldozer architecture.

Dirk Meyer’s Bulldozer is an interesting and revolutionary concept, but like many technologies, it was apparently released ahead of its time and, quite frankly, to disappointing results.

Today, many experts say that Piledriver is what Bulldozer was supposed to be and we say that Steamroller is what Dirk Meyer was envisioning.

AMD decided to double the number of instruction decoders and now each integer pipeline in the processor will have its own decoder.

Moreover, the instruction cache size was increased so that there would be less cache misses. The company claims that this will lower the cache misses by 30% and we think this is a very good thing, considering Bulldozer’s high level 3 cache latency.

The instruction pre-fetch has also been improved to the point that there would be 20% less branch miss-predictions compared with Bulldozer.

Another improvement inside the Bulldozer module (2 integer units + 1 FPU) is the 25% better max-width dispatch.

All these are claimed to improve the operations-per-cycle performance by 30%, but AMD has also improved the single-core execution.

The company acknowledges that the latencies will remain the same and that’s not good news, but if they know this to be certain and still say that there is a 30% performance improvement, we’re satisfied.

AMD has some high memory and level 3 latencies, which are very hard to improve without sacrificing the high clock speed and doing a complete overhaul of the architecture.

Therefore, the company apparently focused on improving the insides of the core and left alone the memory and cache controllers.

The company is also proud to announce much improved dynamic frequency of many of the internal units, as well as the ability to shut down parts of the level 2 cache when not used.

All in all, the Steamroller might see the light of day a lot sooner than expected.

Photo Gallery (5 Images)

AMD Logo / Steamroller
AMD's new Steamroller ArchitectureAMD's new Steamroller Architecture
+2more