Kepler is actually less potent than Fermi when playing Intel’s FP64 game

Jun 27, 2012 11:41 GMT  ·  By

The second problem Nvidia has with its new Kepler architecture is that its raw compute power is actually less impressive than the company’s previous architecture.

Make sure you read the first part of our GPU compute analysis here.

Sure, Kepler is easier to program for and it is actually able to run a basic operating system, but the raw power would have made it stand tall ahead of Intel’s new MIC product line.

Their main problem is that Intel touts 1 TFLOP of real-world double-precision (FP64) performance with its first iteration of Xeon Phi cards.

AMD stands quite alright in that perspective, as the current Radeon HD 7970 Tahiti GPU is able to deliver 947 GFLOPs for a much lower price than Xeon-Phi, while the new Radeon HD 7970 GHz Edition actually surpasses Intel’s goal by a significant margin of about 12%.

Offering this much performance without any “professional” price tag is quite an achievement for AMD’s team.

In fact, Nvidia’s top performing part when DP FP64 performance is concerned, is the Fermi-based Tesla M2090 card that is rated with a real-world double-precision (FP64) performance of 665 Gigaflops or 0.66 TFLOP.

How did Nvidia end up with a new generation of GPU compute accelerators that are slower than the previous generation?

The answer is that Nvidia was not targeting DP FP64 performance with their current Tesla generation, and that they built the new Tesla K10 GPU compute cards using two Kepler GPUs.

Thus, Nvidia’s K10 is able to achieve an impressive peak of 4.6 TFLOPs of single-precision compute performance.

That’s 343% the performance of the Fermi-based Tesla M2090 card, but that’s not what Intel is offering.

Remember that Intel emphasizes on double-precision FP64 performance rather than on single-precision.

Find out more about Nvidia’s plans on GPU compute in our third article.

Photo Gallery (2 Images)

Intel's Xeon-Phi Logo
Nvidia's TESLA K10 Card
Open gallery