AMD Trinity Architectural Preview - Part II

APU architecture detailed in depth right before the launch

  AMD Fusion Logo Graphic
As we said in Part I of our AMD Trinity Architectural Preview, Trinity’s x86 modules are being upgraded to the “Piledriver” design.

As we said in Part I of our AMD Trinity Architectural Preview, Trinity’s x86 modules are being upgraded to the “Piledriver” design.

Piledriver is being characterized by AMD as “2nd Generation Bulldozer core.”

The Piledriver compute module is comprised of two integer cores sharing a FPU unit and a 2 MB Level 2 cache. It adopts new ISA capabilities such as AVX, AVX1.1, AES, FMA3 and F16C.

This, AMD says, will lead to an IPC improvement (IPC is short for instructions per clock). The new APUs also have less leakage and a CAC reduction, along with a frequency uplift.

The Texas-based CPU maker claims performance improvements of over 26% when it comes to overall system performance, and almost 30% increased productivity when working with a laptop.

AMD is also implementing FMA3 instructions. These are SIMD type of specialized instructions just like AMD’s 3DNow! or Intel’s close copy of 3DNow!, but more complex called SSE.

AMD tried to develop sets of instructions specific to their own CPUs, but since most of the software available in the x86 market is compiled using Intel’s compiler, they had to add support to their CPUs for whatever new SIMD set Intel would be developing at the time.

The problem is that Intel’s is still practically sabotaging any other CPU maker in the company’s own compiler. Programs compiled with Intel’s software are practically asking the CPU to provide its branding. If the CPU is Intel branded, then all optimizations and SIMD units are used.

On the other hand, if the CPU answers with any other branding than “Intel Genuine”, then the program will ignore all the specialized units on that CPU and will run the program in the most un-optimized manner.

You have to give Intel credit for the fact that they are actually building the best compiler for some years now and that they’re offering it for free. There are other compiler developers in the market, but since developing a compiler from scratch is so difficult and no other company has Intel’s 50 billion dollars revenue, they’ve never been able to get overall better results than Intel’s solution.

Nobody can force Intel to include or exclude any instruction form its free compiler, but given the company’s financial power and market influence, they can practically sabotage any other player in the x86 field.

You are a developer and you want to clean Intel’s compiler form this idiotic brand identifying function and compile a program that will work best on both Intel and AMD CPUs, you can get more info here.

Besides, considering that AMD pays licenses from Intel to be able to implement different Intel-compatible SIMD instructions in its processors, it’s quite a serious case when Intel’s own compiler sabotages the processors of its licensee, AMD.

In August 2007, AMD announces the SSE5 instruction set, which includes 3-operand FMA instructions. Later on, in April, Intel follows suit and announces their AVX and FMA instruction sets, including 4-operand FMA instructions.

Seeing the increased FMA4 complexity and its use, AMD implements FMA4 in the Bulldozer design thinking it would better have the same ability as Intel’s CPUs.

Fearing the hype about AMD’s Bulldozer design and not wanting to risk having its own compiler optimize programs for the same type of SIMD units as the one in AMDs CPU, Intel changes the specification for their FMA instructions from 4-operand to 3-operand instructions.

The change occurred in December 2008, a full half year before AMD announced the FMA4 implementation, but by then AMD was too advanced with their design to make the switch back to FMA3.

The FMA3 and FMA4 SIMD instructions are very useful and, when optimized correctly, the software will run a lot faster on a CPU with the FMAx instructions active than on a CPU with different or no FMA capable unit.

Intel’s duck-strafe-strafe strategy succeeded because the FMA3 and FMA4 sets are incompatible with one another. Intel’s compiler was built to optimize for FMA3, while AMD’s SSE5 with FMA3 was never implemented. Instead, AMD’s Bulldozer got FMA4 that used four operands with practically no compatibility with any FMA3 optimizations.

Therefore, AMD changed its mind again and went back to FMA3 when it comes to 128-bit SIMD x86 instructions, as Intel announced it would be supporting FMA3 in their Haswell processors in 2013 and Broadwell processors in 2014.

Next to the improved ICP characteristics and the newly added SIMD extensions, AMD will also attempt to achieve a higher processor working frequency using very interesting novel technologies.

More about Trinity's frequency scaling and technologies in our AMD Trinity Architectural Preview - Part III.

8 Comments