375 Gflops on a PCI-E board

Nov 15, 2006 08:46 GMT  ·  By

As some of you may know, Blue Gene/L is the fastest super computer. With its 131,072 processor cores (65,536 dual-core processors), it achieves a peak performance of 367 TFlops, helping it take the 1st spot in Top 500. There is virtually no competition here since the 2nd spot is being held by an Opteron cluster which is at least twice slower than Blue Gene. But AMD thinks it can change that.

The idea behind a "stream processor" is actually quite simple. In case anyone wonders if a GPU can be used to compute other types of data than the one required for 3D rendering, the answer is yes. The main problem behind such an idea is that there is no software optimized to use the GPU. At least, it was a problem until several weeks ago when Folding @ Home project announced that they have produced a client capable of using the GPU in the x1900 series in order to do specific computations. And that rang the right bell in AMD's head.

Due to the fact that AMD now owns Ati, they have the ability to come up with original solutions to this problem. The stream processor is still an x1900 GPU integrated onto a PCI-E board but it features 1GB of DDR3 and a modified memory controller that enables stream computing application, making good use of the independent 48 cores of the GPU. Interconnectivity will be possible using Crossfire technology. And if you take into account that 1000 such boards are capable of matching Blue Gene's performance, you'll get the idea. Ok, maybe it's not cheap at $2600 but Clearspeed asks 3 times more when you want to buy its accelerator which can only deliver about 100Gflops.

The main problem behind such an idea, aside from the manufacturing process, remains the software capable of using such a product. And AMD knows that. They have recently started to promote a new hardware layer called Close To Metal (CTM) which will be able to provide direct access to GPU instruction set and memory.

What remains to be seen is exactly what kind of applications can be written for such hardware. About this, AMD claims that applications of encoding and physics computation are among the ones who will work best on such hardware.