HBM tech is here, but it doesn't impress... yet

Jul 20, 2015 17:01 GMT  ·  By

Many years have passed since AMD managed to catch NVIDIA off-guard with the underwhelming 200 series, and in the meantime AMD did nothing but lose ground against most NVIDIA cards that have followed. Now it wants to come back with new cutting-edge memory technology to end NVIDIA’s “Maxwell” supremacy once and for all. Can it do it? We’ll see in this review.

Two weeks after launching its flagship, the liquid-cooled Fury X, AMD decided to release its smaller, more affordable sibling, the air-cooled R9 Fury (no X). Considered as being the main money-maker, based on the same “Fiji” silicon die and bringing the same SK Hynix HBM (High Bandwidth Memory) technology that made the X version famous, R9 Fury comes with pretty much the same specs as the Fury X.

It has 224 TMUs (Texture Mapping Units), 64 ROPs (Reder Output Unit), and 4 GB of memory across a 4096-bit wide HBM interface. The core is clocked at 1000 MHz, and the memory at 500MHz. Compared with the Fury X, the performance trimming will be felt with the 512less stream processors and the 50MHz lower core clock speed.

The main market advantage of the R9 Fury is that its SKU will not be sold with a mandatory reference design as the Fury X. AMD has allowed its two exclusive custom PCB manufacturers ASUS and Sapphire to custom-tune and make it as affordable as possible, using conventional air-based cooling solutions. Although the card has a very serious potential of outselling both the Fury X and NVIDIA cards, AMD has chosen to give R9 Fury exclusivity to only two custom manufacturers, which is a strange move for capitalizing on such a potent card.

The board we’re reviewing here , however, is the ASUS Radeon R9 Fury STRIX equipped with DirectCU IIItriple-fan cooling solution found also on the Radeon R9 390X and GTX 980Ti. Although ASUS will not overclock this card by default, it’s still asking slightly more money than AMD’s suggested retail price.  

ASUS R9  Fury STRIX GeForce  GTX 980 Ti
Shader Units 3584 2816
ROPs 64 96
Graphics Processor Fiji GM200
Transistors 8900M 8000M
Memory Size 4096 MB 6144 MB
Memory Bus Width 4096 bit 384 bit
Core Clock 1000 MHz 1000 MHz+
Memory Clock 500 MHz 1750 MHz
Price $579 $650

High Bandwidth Memory Explained

We can’t review the R9 Fury without giving a proper explanation of what HBM means and how it actually works. The whole GPU, Interposer and memory package comes in the shape of a 28nm process TSMC-built GPU die that sits beneath a specially designed silicon substrate layer that connects the GPU with the HBM memory stacks.

The silicon substrate called Interposer is built by UMC on a 65nm process followed by four 1GB HBM stacks built by SK Hynix on a 20nm node. The Fury GPU design is the result of a collaborative effort of at least five VLSI (Very-Large Scale Integration) companies: UMC, ASE Group, TSMC, SK Hynix and Amkor Technology, who have all contributed to build the new cutting edge memory system.

The need for a new post-GDDR5 memory system came after it was discovered that the old memory chips layout on the PCB needed more and more power requirements when memory clocks increased. Not only the chips started to occupy more and more space on the PCB but also the power required was wasted on disproportionate consumption between chips and the logic die.

Put that together with the increased bandwidth of new GPUs that also require more power and you’ve got yourself a mishmash of amalgamated modules poorly optimized for power consumption that do not offer the promised performance although the hardware is there. This way power-efficiency became a keyword for AMD, and to achieve this, a new memory architecture was required.

HBM tech described in detail
HBM tech described in detail

HBM comes along and enhances the overall energy efficiency by trading frequency for bus width, hence the large 4096-bit interface, allowing vertical stacking of up to four DRAM dies (for now!), with a fifth die that holds the PHY on the Interposer and connects the stacks with the GPU. This sort of extreme proximity between key modules simplifies power delivery.

Another reason for stacking the HBM memory modules on the die interposer is because each HBM stack features a 1,024-bit memory bus that will require an enormous number of pins if it were to be arranged on a classical PCB. Placing them on the special Interposer removes the need for pins as the substrate will deal much better with the massive bus size than normally arranged pins. More than that, the Interposer is in fact covered in microscopic bumps that stand for any wiring that would have otherwise been placed all over the PCB and connect the memory stacks and the GPU below. It guarantees that all the action happens on the GPU instead on all the PCB, reducing its overall footprint with as much as three times as classical GDDR5 layouts.

Packaging and Components

We received the ASUS card in a neat package with the card placed in a stylish black box with the STRIX logo placed on it. Underneath it, we could find the documentation and driver CD together with PCI-Express power cable and a fancy ASUS STRIX sticker. All this was covered in a three-piece Styrofoam package that managed to fit inside the outer package box nice and tight.

The card itself

The first impression of the card is nothing less than awe-inspiring. It’s by far one of the largest, if not THE largest card we’ve ever seen. The obvious contradiction between the advertised PCB readiness for space economy, as more is crammed on the logic die, and the actual sheer size of the card itself struck us. We did not take into account the huge cooling solution from ASUS, just the PCB itself, as it’s wider and longer than both the GTX 980Ti and the Titan X.

A chrome and red monster
A chrome and red monster

Display connectivity options include one DVI port, one HDMI port and three DisplayPorts. There’s also an HDMI 1.4, a sound device that supports HD audio, Blu-ray and 3D movies. Moreover, it has no physical CrossFire connector as CrossFire data is now sent via PCI-Express bus (just like NVIDIA does).

The cooling solution is as impressive as all ASUS’ STRIX systems that can be found on all graphics cards that support STRIX. The massive DirectCU III fans cool the five massive heat pipes that make direct contact with the GPU for as much heat transfer as possible. Power-wise, the R9 Fury eats 375W fed through 8-pin PCI-Express power connectors.

Test Setup

OS Windows 7 Professional 64-bit SP1
CPU Intel Core i7 3820 @ 3.60GHz Sandy Bridge-E 32nm Technology
RAM Kingston Hyper X  8.00GB DDR3 @ 667MHz (9-9-9-24)
Motherboard ASUSTeK COMPUTER INC. RAMPAGE IV EXTREME (LGA2011)
Graphics AMD Radeon R9 Fury 4GB Hynix HBM 
Monitor DELL P2311H (1920x1080@60Hz)
Storage 111GB ATA KINGSTON SH100S3 SCSI Disk Device (SSD) 465GB ATA WDC WD5001AALS-0 SCSI Disk Device (SATA)

We tested the R9 Fury using 3D Mark Advanced, The Witcher 3: Wild Hunt, Metro Last Light Redux and Middle-Earth: Shadow of Mordor on Ultra settings on a Dell 23” Full HD monitor. Where supersampling was available, like Shadow of Mordor, we could see how the R9 Fury tackled the 4K resolutions in gaming conditions. We also want to mention that for unknown reasons Battlefield 4 didn’t start at all while testing the R9 Fury. Even though it worked flawlessly with our previously tested GTX 980Ti as well as a GTX 760, it simply didn’t start while we had the R9 Fury in our system.

However, the true limits of this card were reached during the 3D Mark Fire Strike Ultra 4K version that requires dual GTX 980s or Titan Xs in SLI with the latest i7-4790k overclocked as standard. As 3D Mark was able to force 4K resolutions on our R9 Fury, even though our monitor couldn’t support it, we could see the massive bottleneck an i7 from Q1 2012 can create when rendering 4K resolutions.

3D Mark Advanced

Our 3D Mark Advanced benchmarking software was updated to its latest 1.5.915 version on June 5, a month before the R9 Fury was launched. And yet, 3DMark couldn’t properly detect our driver, nor could our GPU Meter software. Although the 3DMark could give us accurate information on what we could expect from the R9 Fury, it’s clear that our latest Catalyst 15.7 wasn’t fully recognized. Funny part is that although 3D Mark Advanced explicitly required the 15.7 version to be installed in order to be compatible with the latest R9s, our graphics card still remained unrecognized.

Fire Strike High Performance

We managed to make a complete comparison with our previous GTX 980Ti graphics card from NVIDIA on all our 3DMark tests, so we can better assess the performance differences between the two direct competitor products from AMD and NVIDIA.

The most forgiving FireStrike benchmark demo starts with good performance for both cards in full HD resolutions. With average graphics test framerates going neck and neck between the GTX and the R9 Fury, but with the 980Ti ultimately taking the lead in … with an indisputable 10 frames ahead of the Fury, the final scores will give us 12478 points for the NVIDIA card and 11384 for the R9 Fury. While the shader-oriented demos will give us the GTX 980Ti as a clear winner, it’s the physics test that put both cards on serious stress both of them giving almost exactly the same 30fps average with no clear winner.

FireStrike Extreme

FireStrike Extreme raises the resolution to 2560 x 1440, increasing the tessellation volume and dynamic particle illumination. The Physics test runs 32 parallel simulations of soft and rigid body physics on the CPU, a reason why the framerates will not differ too much on the two cards.

As expected, the score lowers even more to 5896 for the R9 Fury with graphics tests reaching an average of 31fps and full combined physics and shader stress test going as low as 12fps. However, the GTX 980Ti does a little bit better with a score of 6983 and with a clear advantage of 10fps more in the shader stress demo and with similar results with the Fury in physics tests. Combined physics and shader tests also declare the GTX 980Ti the final winner with 16fps instead of Fury’s 12fps.

FireStrike Ultra

This is the ultimate and the most demanding 3DMark test. No ordinary computer will be able to handle this demo as it boosts graphical resolutions to full 4K. At least 3 GB of video memory is required to start the Ultra test by default, but it seems this was not enough to reach FPS rates higher than 30 at 3840 x 2160 resolution in any scenario.

Here the Fury managed to gather a humble 3515 points with 17fps at best during shader tests being better than only 53% of tested systems. This is no ordinary matter, however, as the 4K gaming PC tab will show us what sort of systems have been tested on this very demanding benchmark. They come with dual GTX 980 GPUs and the latest Haswell K-series CPUs from Intel. True monsters.

Yet, our physics tests were handled much better by the R9 Fury as 30fps were easily reached and maintained a healthy FPS stability during the physics demo. Unfortunately, when the R9 Fury had to pass the combined shader and physics stress test, the framerate crumbled to a lowly 8fps.

The GTX 980Ti, however, scored very similarly to the Fury on our 4K tests, although gathering slightly more points again, 3779 to Fury’s 3515. NVIDIA’s card held a very weak supremacy this time with only a couple more frames on shader tests than the Fury, which again proves that neither card is fully suited to full-blown 4K rendering of shader loaded games. Not alone. Having two Furies or two 980Tis in SLI or CrossFire will make 4K entertainment a much more realistic endeavor.

All tests but the Physics will have the GTX 980Ti slightly ahead
All tests but the Physics will have the GTX 980Ti slightly ahead

Witcher 3 Wild Hunt

Developed by Polish studio CD Projekt RED, the Witcher 3 follows the adventures of Geralt of Rivia in a fantasy world created by Andrzej Sapkowski. In our review of The Witcher 3, we praised the quest system and the free roaming in a fantasy world within a beautiful medieval environment offered by one of the most graphically demanding games on the market.

The R9 Fury handled the game greatly, almost flawlessly, with no immediate issues, and with a constant 60fps in most scenarios and cut scenes. Nonetheless, being an NVIDIA dedicated title, some issues did come to light. First of all, it is the badly implemented support of Geralt’s hair. Apparently, being an NVIDIA feature called NVIDIA HairWorks, Geralt’s hair wasn’t flowing as graciously as on NVIDIA cards but rather losing collision with Geralt’s character several times. Not a serious matter, to be honest, as it’s just a simple marketing tool to promote the NVIDIA brand through the game.

Radeon R9 Fury (left) vs GTX 980Ti (right)
Radeon R9 Fury (left) vs GTX 980Ti (right)

Another more pressing issue was caused by performance inconsistency during scenes. Some scenes enjoyed a solid 58-60fps, other fell to a paltry 25fps for no reason. All available updates were applied to both the game and the Catalyst driver, and the issues still remained.

Another problem was the lack of immediate implementation of new settings. While switching everything to Ultra with VSync off and Maximum Frames per Second on, we encountered severe screen tearing and framerate issues that were rectified only after restarting the game.

However, we must admit that the overall performance without the surprise FPS drops was a consistent 60+FPS which couldn’t be said about the GTX 980Ti as well. The difference in framerates isn’t too high, and the GTX 980Ti was more stable, but the end winner in absolute FPS terms is the R9 Fury.

Metro Last Light Redux

Being the successor of the highly praised Metro 2033, Metro Last Light tells the story of Artyom, a post nuclear apocalypse survivor in the Moscow underground tunnels trying to find the source of the mysterious Dark Ones while travelling from one human outpost to another inside Moscow’s dilapidated subway tunnels.

Metro Last Light is again an NVIDIA dedicated title, and unlike Witcher 3, this becomes quite obvious quickly. On Ultra settings and 1080p resolutions, framerates would never go higher than 45fps. It’s never unplayable, but you can feel framerate drops at every step. Metro Last Light is a shader-heavy title packed with smoke physics as well. It won’t stress your HBM memory, but your stream processors, and while it renders the game decently, framerates could become an issue during intense firefights.

You can blame the game for poor optimization for AMD cards, however, when you realize that R9 Fury comes with 3584 shader units compared with GTX 980Ti’s 2816 CUDA Cores, it’s clear that raw extra performance could, in theory, overcome lack of optimization issues when in fact it doesn’t. If the lack of GTX 980Ti’s extra 32 ROPs is to blame, we can’t tell, but the R9 Fury should perform much better.

Middle Earth: Shadow of Mordor

Monolith’s latest title follows Talion’s non-canon post-mortem adventure inside Mordor while hunting Sauron’s orcish captains in order climb up the ladder of the Nemesis System and defeat Sauron.

Shadow of Mordor had an internal benchmarking system that will allow users to test how the game will run different settings. Also the game allows choosing 4K internal resolutions of up to 3840 x 2160 by supersampling your monitor’s native resolution.

Middle-Earth: Shadow of Mordor settings
Middle-Earth: Shadow of Mordor settings

The 1080p Ultra settings will show excellent performance with framerates going as high as 123fps during the internal benchmark test while raising the resolution to 4K settings, the framerates will drop to 28fps without making the game unplayable. What’s truly impressive is that the R9 Fury keeps the 28-30fps threshold in 4K resolutions  with Ultra settings on, which isn’t a small feat.

Shadow of Mordor isn’t a shader-heavy game, but it does need some serious memory clock speeds together with memory size as it’s an open world game. The new HBM tech from AMD will run this game flawlessly with framerates going well over the 100fps threshold in full HD resolutions.


The Good

The Radeon R9 Fury is an excellent video card that treads new territory with its cutting-edge HBM memory stacking technology for better power distribution efficiency. It manages to keep very close to the GTX 980Ti performance-wise while being almost $100 cheaper.

The first-generation HBM memory size is, however, underwhelmingly small, only 4GB, while NVIDIA’s GDDR5 cards come in 6 and 12GB flavors. The performance boost caused by better power optimization in the HBM technology is considerable enough to compensate the smaller memory size. The gap of performance in AMD’s favor will probably be felt, however, when HBM2 hits the markets sometime next year.

The Bad

Typical AMD issues plague the R9 Fury as well. Bad game optimizations and general compatibility issues between the card’s driver and different applications will cause extra hassle for users who want an out-of-the-box immediate fun experience. Also some games might not boot at all if they have deficient launchers while others will require restarts and extra tweaks for their games to run properly on the new card.

Amounting to that is also the possibility of „sabotage” occurrences by NVIDIA who might convince developers to implement features unique to NVIDIA cards that could hinder AMD’s cards performance.

We also had the unpleasant surprise of having the card crash on us while we attempted to overclocked it by 10% (GPU clock) using the driver’s built-in overclocking feature. It allowed the card for 10 seconds of improved performance in 3DMark’s FireStrike Extreme graphics test before going into a BSOD. We didn’t try the operation again.

Conclusion

The R9 Fury is a great card for the present; it comes with a cutting-edge high-bandwidth memory system, and offers a new GPU architecture way ahead, in theory, of what NVIDIA offers at the moment. It’s an excellent proof of concept on how HBM has the potential of overcoming all the limitation of the GDDR5s with less memory size and more power efficiency. However, limitations given by poor optimizations of different games, quirky drivers and bad overclocking abilities would make this card unattractive for some people.

We believe that in the end the R9 Fury could really compete with the GTX 980Ti if AMD pushed the price for this card a bit lower. The GTX 980Ti cooling solutions are vast and come in cheaper prices. Keeping in mind that the GTX 980Ti is faster and more stable in stock version than the Fury, it’s clear that AMD’s card will have no chance of competing with it while overclocked.

Overall, the R9 Fury will sell better than the Fury X as it is much quieter and cheaper while packing roughly the same performance, but to really take on NVIDIA’s Maxwells AMD fans should wait for HBM2 to really close the gap. While NVIDIA’s Pascal architecture is still in the far future, AMD has a real chance of killing a giant, again.

AMD Radeon R9 Fury (18 Images)

AMD Radeon R9 Fury: a large fit in tight systems
HBM tech described in detailA chrome and red monster
+15more