GPUs taking over your horizontal and vertical

Dec 13, 2006 15:14 GMT  ·  By

Today we are going to inspect the GPU. First, we'll focus on a brief history of the GPU, while the second part of this article will present some important inner workings of the GPU.

GPU Lineage The 1990s saw a glorious development for the graphics card market, but few companies have managed to resist as of 2006. Things weren't all that harsh in the beginnings. In order to understand some of the most important moments that influenced the sinuous path of graphics processing hardware, we may want to take a look into the past.

Modern GPUs are descended from the monolithic graphic chips of the late 1970s and 1980s. These chips had limited BitBLT support in the form of sprites, and usually had no shape-drawing support. Some GPUs could run several operations in a display list, and could use DMA to reduce the load on the host processor; an early example was the ANTIC co-processor used in the Atari 800 and Atari 5200. Graphics cards were quite simple and many were specialized for just a couple of tasks. As chip process technology improved, it eventually became possible to move drawing and BitBLT functions onto the same board (and, eventually, into the same chip) as a regular frame buffer controller such as VGA. These were only capable of 2D acceleration and were not as flexible as microprocessor-based GPUs.

The 1980s introduced the Commodore Amiga, which was the first mass-market computer to include BitBLT support for its video hardware. IBM's 8514 graphics system was one of the first PC video cards to implement 2D primitives in hardware. In 1987, IBM created the VGA standard which supported only 256 colors on screen (keep in mind that present-day standards such as Quantum Extended Graphics Array which allows the display of millions of colors at resolutions up to 2040 x 1536 pixels, are quite common). The Amiga set the trend for that time, as it featured what would now be recognized as a full 2D accelerator, offloading practically all video generation functions to its dedicated hardware. This video hardware included a fast BitBLT processing engine, a hardware sprite engine, a display port scrolling engine and hardware resources to draw lines, fills and other primitives. This revolutionary design helped offloading the CPU and thus gave a great boost to the video capabilities of PCs.

By the early 1990s, the dominance of Windows meant PC graphics vendors could now focus their development efforts on a single programming interface - the Graphics Device Interface (GDI).

In 1991, the S3 Graphics Company was the first to introduce a single-chip 2D accelerator. S3 86C911 was named after the Porsche 911 car model as an indication of the speed increase it promised. The 86C911 spawned a host of imitators and by 1995, all major PC graphics chip makers had added 2D acceleration support to their chips. This development came as a death sentence to the expensive general-purpose graphics co-processors.

Additional APIs (Application Programming Interface) were conceived for a variety of tasks. Two of the first Microsoft APIs were the WinG graphics library for Windows 3.x and the DirectDraw interface (now known as DirectX) for hardware accelerated 2D games in Windows 95 and later OSs. DirectX and OpenGL are the most common APIs nowadays. What exactly does an API do? It mainly helps hardware and software communicate more efficiently by providing instructions for complex 3D rendering tasks. Developers optimize graphics-intensive games for specific APIs. This is why the newest games often require updated versions of DirectX or OpenGL to work correctly.

Note that APIs are different from drivers, which are programs that allow hardware to communicate with a computer's operating system. Just like APIs, updated device drivers can help programs run correctly.

The first mass-marketed 3D graphics hardware could be found in fifth generation video game consoles such as PlayStation and Nintendo 64, as soon as the mid 1990s. Notable as failed first-tries for low-cost PC 3D graphics chips were the S3 ViRGE legacy, ATI's Rage model and Matrox Mystique. Initially, performance 3D graphics were possible only with separate add-on boards dedicated only to accelerating 3D functions, lacking 2D GUI acceleration entirely. This was the case of the highly acclaimed 3dfx Voodoo GPU. Meanwhile, the manufacturing technology inevitably progressed and video, 2D GUI acceleration, as well as 3D functionality were all integrated into one chip. Rendition's Verite chipsets were the first to feature an all-round graphics support.

When DirectX was recognized as a reliable API and finally became one of the leading 3D graphics programming interfaces, the 3D accelerator market exploded. Direct3D 7.0 inaugurated the hardware-accelerated transform and lighting (T&L) support. This way, 3D accelerators became more than simple rasterizers and the first GPU to support a T&L engine was nVidia's 256 (NV10).

The DirectX 8.0 and OpenGL API's further improved things and GPUs started to implement programmable shading instructions. Each pixel could now be processed by a short program that could include additional image textures as inputs, and each geometric vertex could likewise be processed by a short program before it was projected onto the screen. By October 2002, with the introduction of the ATI Radeon 9700 (also known as R300), the world's first Direct3D 9.0 accelerator, pixel and vertex shaders could implement looping and lengthy floating point mathematic instructions and became as flexible as CPUs.

The newest version of DirectX, DirectX10, will be officially released with Microsoft Windows Vista. Nvidia's GeForce 8800 series is the first one to implement DX10 capabilities and is already available, while ATI's R600 GPU, also having DX10 support, won't see the shelves until January 2007.

3DFx Voodoo
ATI Rage
GeForce 8800
GeForce 256
IBM 8514
S3 86C911

Matrox Mystique
S3 Virge
Radeon 9700
We have now taken over your visual system! Time to see how a GPU can render entire worlds. To put it in basic graphics terms, the GPU draws the polygons that are formed by vertex sets, attaches textures to these polygons, calculates the lighting values in the scene and then tells your monitor how to display the pixels. Now we understand that the GPU is really a co-processor that handles the heavy polygon calculation tasks and sends signals up to the monitor. The present-day GPUs also feature post-processing special effects that can be added to the final image with the aid of programmable pixel and vertex shaders. These effects can render environment reflections, bump maps or normal maps, multi-textures and others in real-time without ever taxing the CPU.

But let us take a look at several more basic algorithms performed by the GPU. An important part in computer graphics processing is attributed to texture mapping. Textures maps are stored as files in the system. The textures are further changed by various techniques, according to new objects introduced in the environment. Here are some texture mapping methods (I won't be explaining them, however):

1. Projective Texture Mapping. 2. Image Warping. 3. Transparency Mapping. 4. Surface Trimming.

After the texture map has been stored, it must be displayed on screen. Texture maps are 3 dimensional spaces, which consist of polygons. Polygons can be divided into texels (3dimensional pixels). Now, when we have a texture map consisting of these texels, the question of which texel should be used to determine the 2 dimensional pixels arises. The main technique used is by imagining a ray of light, which moves through a filter and highlights a group of texels. The main filtering techniques are:

1. Point Sampling. 2. Bilinear filtering. 3. Trilinear Filtering. 4. Anisotropic Filtering.

As anisotropic filtering has become the industry standard, let us see how this works. The anisotropic system uses all the texels that lie in the highlight and averages them. It takes into account the angle of the polygon which will appear on the screen. Hence, it assumes the shape of the highlight to be an ellipse whose eccentricity changes with the angle. Thus, this would give a much better average for the texels. However, this operation is more bandwidth consuming as it is more complex. Current GPUs use between 16 and 32 texels from the texture maps. This algorithm provides crisp visuals for those games that present huge worlds, without blurring the details of the texture map when the mapped objects are relatively far away from the observer.

Anisotropic Filtering
Wireframe Image displaying a huge number of polygons
Mapping a texture onto a wireframe

I won't get into transform & lighting techniques or pixel & vertex shaders, because presenting all these would take entire articles. Just look them up on the adorable Wikipedia if you really want to know about this stuff. I'll just mention that combining all the basic and post-processing algorithms, the GPU renderings get closer and closer to photorealistic images with each generation.

Know the dark powers of your GPU How do we know if our graphics monster performs well in present-day applications and games? One method would be finding out the average frame rate sported by your GPU. The frame rate is measured in frames per second (FPS) and there are special programs that can display FPS counters while you play the latest games or there are benchmarks that evaluate the processing power of the GPU.

The frame rate describes the number of complete images the card can display per second. The human eye feels happy when there are more than 28 frames per second, but fast-action games require a frame rate of at least 60 FPS to provide smooth animation and scrolling. The frame rate works hand in hand with other aspects. Two of the most important among these are the number of triangles or vertices per second and the pixel fill rate.

As 3-D images are primarily made of triangles, or polygons, the number of triangles describes how quickly the GPU can calculate the whole polygon or the vertices that define it. In general, it describes how quickly the card builds a wire frame image. Current GPUs such as nVidia's G80 can process up to 36.8 billion texels per second. In the case of pixel fillrate, we calculate how many pixels the GPU can process in a second, which translates into how quickly it can rasterize the image.

The graphics card's hardware directly affects its speed. These are the hardware specifications that mostly affect the card's speed and the units in which they are measured:

* GPU clock speed (MHz) * Size of the memory bus (bits) * Amount of available memory (MB) * Memory clock rate (MHz) * Memory bandwidth (GB/s) * RAMDAC speed (MHz)

The latest graphics cards allow users to overclock some of these specifications. By manually setting the GPU clock or the memory clock to higher values, you could get a nice performance boost. You do this by directly accessing the BIOS of the graphics board or by using special functions from the graphics card drivers. People usually overclock the memory, since overclocking the GPU can lead to overheating. Keep in mind that overclocking can lead to better performance, but it can also void the manufacturer's warranty.

That would be enough about graphics cards and GPUs in general. For the next article, I figured out that it would be a good idea to present types of displays. CRTs, LCDs, Plasmas as well as non-standardized types or future concepts, all await you tomorrow, so be sure to check back.