Last week, AMD launched its largest midrange hardware update in years, codenamed Polaris. Polaris isn’t a brand-new architecture — that’s Vega, which arrives late this year — but it’s arguably a larger update than anything we’ve seen from AMD since the original GCN debuted in late 2011. GCN 1.1 (Bonaire, Hawaii) and 1.2 (Tonga, Fiji) both improved on the original microarchitecture and integrated additional heterogeneous compute capabilities, but both were fairly modest improvements in the grand scheme of things. Polaris aims to deliver larger improvements across the entire GPU stack, while keeping some of the features first introduced with AMD’s Fury family of products.
The RX 480 is AMD’s first 14nm FinFET GPU and it brings a number of improvements to the table. HDMI 2.0b and DisplayPort 1.3 and 1.4 are both supported, as are emerging features like High Dynamic Range (HDR) displays, FreeSync (via both DisplayPort and HDMI), and a new H.265 / HEVC decoder block with support for up to 1080p240, 1440p120, or 4K60 (that’s the resolution followed by the maximum frame rate). DVI users will need an adapter if they want to use that form factor — unlike many of AMD’s older parts in this price range, the RX 480 packs one HDMI 2.0b port and 3x DisplayPorts.
Normally, we dive into the architectural details of a new GPU design first and tackle the market positioning later. In the RX 480’s case, however, AMD has chosen to lead with a midrange product that targets mainstream enthusiasts rather than launching high-end hardware first, with midrange parts launching later. AMD’s entry and midrange products are definitely due for a refresh, but it’s important to put the RX 480 in perspective — at $199 and $239 for the 4GB and 8GB versions of the RX 480, AMD isn’t trying to overtake the likes of the R9 Fury X or GTX 980 Ti. Instead, the company’s goal was to create a GPU that would offer improved performance and significantly better power consumption for the majority of users.
Based on what we’ve seen so far, it’s succeeded, though AMD’s decision to launch into the mass-market first makes it a bit trickier to put the RX 480 in proper context. Nvidia’s two major competitors to the RX 480, at least for now, are the GTX 960 and 970, but neither are a clean match — the cheapest GTX 960s are well under the $200 mark, while GTX 970 currently starts around $265. While there’s a GTX 1060 rumored to be launching in the very near future, Nvidia’s GTX 1080 and 1070 remain very thin on the ground — we’ll have to wait and see if they can ship a 1060 GPU in significant volume.
While we’ve included both the GTX 960 and 970 in this review, we’ve decided to evaluate the RX 480 alongside AMD’s previous GPUs in the same price bracket. Over the past six years, the company has launched a number of cards between $200 – $240 — if you own an R9 380, R7 270X, or even an HD 6870, is the RX 480 a worthy upgrade? How does it compare against the R9 390, AMD’s current 8GB Hawaii-derived GPU?
The RX 480 packs 36 compute units (CUs) with 64 cores per CU and 2304 cores in total. There are 144 texture units and 32 ROPS in the full configuration, backed up by a 2MB L2 cache and a 256-bit GDDR5 memory bus. At a high level, the chip doesn’t look too terribly different from previous-generation GCN architectures, but there are significant improvements under the hood.
One of the differences between AMD and Nvidia GPUs has been their ability to handle extremely high levels of scene geometry. Nvidia cards have typically outperformed their AMD counterparts in this regard, though the real-world usefulness of their capabilities has been questionable. The RX 480 introduces a new feature, primitive discard acceleration, which is designed to close the gap in small triangle performance.
By throwing out triangles earlier in the rendering process, the RX 480 can save bandwidth and reduce the performance penalty hit it takes with MSAA enabled. Overall primitive throughput should be higher with the RX 480 than what we saw with earlier cards, despite the fact that the RX 480 has the same maximum number of primitives per clock (4) as last year’s Fiji products.
Polaris’ second major improvement is its improved shader efficiency and instruction caching. Polaris is now capable of speculatively prefetching instructions, which should reduce pipeline stalls and boost total performance. Speculative prefetching has been used in CPUs for decades to generally good effect, though it’s important to balance the feature’s power consumption against its improved performance.
Finally, there’s Polaris’ improved support for delta color compression. AMD didn’t go into as much detail as Nvidia did during its Pascal discussion, but the company’s high-level data suggests significant improvements in overall bandwidth efficiency. By compressing color data AMD can squeeze more effective performance out of the same raw bandwidth (224GB/s on the 4GB RX 480, 256GB/s of bandwidth on the 8GB card).
Ever since AMD first unveiled Polaris at its Sonoma event last winter, the company has claimed that 14nm FinFET and new design elements would help it deliver as much as a 2.5x improvement in performance-per-watt. Some of that gain comes courtesy of FinFET technology and the smaller process node, but much of the rest comes courtesy of Carrizo, AMD’s first APU to implement Adaptive Voltage and Frequency Scaling (AVFS). We extensively covered AVFS when AMD adopted it last year.
Traditionally, GPU and CPU manufacturers have used a different method of controlling voltage and frequency, called Dynamic Voltage and Frequency Scaling (DVFS). DVFS works by adjusting a CPU’s voltage to match its frequency in stairstep fashion. The response curve is set by the OEM as part of the CPU family’s specification and is designed to ensure a significant margin of error is available at all times. As the slide above shows, when VDD drops, clock speed drops even farther to ensure stable operation.
AVFS is implemented by monitoring each individual die at specific points and calibrating its voltage and frequency targets on a per-chip level. While this requires an extensive sensor network, there are two significant payoffs. First, it allows AMD to reduce per-part performance variation — each chip should be capable of hitting closer to maximum theoretical performance. Second, it gives AMD the ability to reduce its margin of error and operate closer to an ideal frequency / voltage curve.
Polaris’ more efficient delta color compression, larger L2 caches, and a new, optimized multi-bit flip-flop (MBFF) approach also helped AMD cut total ASIC power consumption by 4-5%.
The asynchronous compute capability in Polaris is fundamentally similar to what AMD introduced with Fiji. Fiji included a hardware scheduling block (HWS) that could be used to improve asynchronous compute workload efficiency. It includes a quick response queue for implementing asynchronous time warp in VR applications and the ability to reserve compute unit blocks for executing TrueAudio workloads.
Many of these capabilities were also included in Fiji, but weren’t fully enabled or exposed when the hardware shipped. AMD has updated its software drivers for Fury and Nano cards to expose these capabilities and included them in Polaris as well. In this respect, the RX 480 has all the asynchronous compute capabilities that AMD loaded into Fury X, but in a $200 GPU instead of a $600 card.
We tested the RX 480 using an Asus X99-Deluxe motherboard and an Intel Core i7-5960X CPU, 16GB of DDR4-2667, and an Antec 750W 80 Plus Gold power supply. The GeForce GTX 970 and 980 were tested using Nvidia driver 368.39, the RX 480 was tested with an unreleased Catalyst driver; the other AMD cards were tested with Catalyst 16.6.2 Hotfix.
There are a few additional things to be aware of. First, we tested the HD 6870 with AMD’s Catalyst 15.7 beta drivers. AMD dropped support for its pre-GCN hardware when it launched Radeon Software; the Catalyst 15.7 betas were what the company’s auto-detection application recommended for this GPU.
Second, we shifted from a 1200W Thermaltake PSU down to a 750W Antec 80 Plus Gold PSU for this review. Our power consumption benchmarks for all GPUs were rerun on this hardware, which is why the data in this review won’t match other coverage.
Third, while we’ve included Ashes of the Singularity benchmark data in this review, we need to note that performance in Ashes has changed markedly on both AMD and Nvidia cards since we last covered the title. Toggling the asynchronous compute feature off vs. on no longer has any impact on Maxwell GPUs from Nvidia. We’ve confirmed this with extensive testing, including falling back to older drivers we used in our February article.
Fourth, while we normally include both 1080p and 4K results, the RX 480 is not intended for 4K testing and does not perform well in that resolution (we checked). Given this, we’ve skipped 4K benchmarks and stuck to 1080p. We’ve also specified how much RAM was on each GPU in our lists of results.
BioShock Infinite is a DirectX 11 title from 2013. We tested the game in 1080p with maximum detail and the alternate depth-of-field method. While BioShock Infinite isn’t a particularly difficult lift for any mainstream graphics card, it’s a solid last-gen title based on the popular Unreal Engine 3.
Our first title shows the 8GB RX 480 competing well against the GeForce GTX 970 and the R9 390. While slightly edged out by both last-gen cards, the RX 480’s lower price tag more than compensates.
As far as whether the RX 480 provides a qualitatively different experience than previous $200 cards, it definitely outclasses both the HD 6870 and the R9 270X, both of which would see dips below 60 FPS in regular gameplay. The R9 380 and RX 480 perform identically as far as the naked eye, as do the GTX 960 and GTX 970.
Company of Heroes 2 is an RTS game that’s known for putting a hefty load on GPUs, particularly at the highest detail settings. Unlike most of the other games we tested, COH 2 doesn’t support multiple GPUs. We tested the game with all settings set to “High,” with V-Sync disabled.
COH2 isn’t playable on the HD 6870 at these settings, and the R9 270X, GTX 960, and R9 380 aren’t great, either. The RX 480 is a hair faster than the GTX 970 and essentially ties with the R9 390 in terms of overall performance.
Metro Last Light Redux is the remastered version of Metro Last Light with an updated texture model and additional lighting details. Metro Last Light Redux’s benchmark puts a fairly heavy load on the system and should be seen as a worst-case run for overall game performance. We test the game at Very High detail with SSAA enabled.
The RX 480 is 1.42x faster than AMD’s previous R9 380, and only a hair slower than the GTX 970 (4%). The R9 390 wins this test decisively, by 15%, easily the largest gap we’ve seen open up between the two cards to-date. The RX 480’s improvement over its predecessors make this the first $200 GPU from AMD that we’d say is realistically capable of running these detail levels. At the same time, our use of SSAA likely explains why the R9 390 pulls ahead to such a degree — supersampled antialiasing is a brute-force method of improving visual quality and the R9 390 has more raw power at its disposal.
Total War: Rome II is the sequel to the earlier Total War: Rome title. It’s fairly demanding on modern cards, particularly at the highest detail levels. We tested at maximum detail levels, with SSAO and Vignette enabled.
Total War: Rome 2’s performance on the HD 6870 is an example of how frame rates don’t always capture everything there is to know about how a game performs on two different GPUs. While the HD 6870 appears to match the R9 270X (and believe me, we re-ran that test several times), the HD 6870’s frame rate delivery is much more erratic than the later GPUs. That’s to be expected, given the HD 6870’s small frame buffer, but the overall frame rate was still surprising.
Rome 2 runs better, on the whole, on Team Green hardware. The gap between the RX 480 and the GTX 970 is fairly significant, though the R9 390 is itself within shooting distance of the GTX 970.
Shadow of Mordor is a third-person open-world game that takes place in between The Hobbit and the Lord of the Rings. Think of it as Grand Theft Ringwraith, and you’re on the right track. We tested at maximum detail in 1080p with FXAA enabled (the only AA option available).
The RX 480 picks up a clear win here over the GTX 970, losing only to the older, higher-end Hawaii-based R9 390. Overall performance is excellent for Team Red.
Dragon Age: Inquisition is one of the greatest role playing games of all time, with a gorgeous Frostbite 3-based engine. While it supports Mantle, we’ve actually stuck with Direct3D in this title, as the D3D implementation has proven to be superior in previous testing.
While DAI does include an in-game benchmark, we’ve used a manual test run instead. The in-game test often runs more quickly than the actual title, and is a relatively simple test compared with how the game handles combat. Our test session focuses on the final evacuation of the town of Haven, and the multiple encounters that the Inquisitor faces as the party struggles to reach the chantry doors. We tested the game at maximum detail with 4x MSAA.
We were forced to omit the HD 6870 from this test, since that GPU isn’t really capable of actually benchmarking the title at our chosen detail levels. Here, Teams Red and Green are evenly matched — the R9 380 and GTX 960 tie, as do the GTX 970 and the RX 480. Even the R9 390 is only barely faster.
Ashes of the Singularity is one of the first mainstream DirectX 12 titles. It’s an RTS game that’s designed to take full advantage of DX12 features like asynchronous compute and we’ve covered it since it launched in Early Access almost a year ago. We benchmarked the game in 1920×1080 with the Extreme detail preset.
The HD 6870 can’t run DX12, but the other cards perform fairly well. The R9 270X and GTX 960 aren’t fast enough to play at these resolutions and detail levels, but the R9 380 can still break 30 FPS at “Extreme” detail. The GTX 970 is significantly faster than the R9 380, but it’s not quicker than RX 480, which outpaces it by 12%.
The GTX R9 390, on the other hand, is even faster still. This gap is due to differences in how the two GPUs handle asynchronous compute — while the RX 480 only picked up about 3% from enabling or disabling the feature, our tests showed that the R9 390 still gains 12% from using the capability. That’s enough to give the R9 390 the overall win in this particular test.
The RX 480 has demonstrated that it can hang with the top dogs in its price band as far as overall performance — but what about power efficiency? This has long been the Achilles heel of the GCN family and AMD promised that we’d see dramatic improvements when RX 480 finally launched. Did the company deliver on its promise?
To find out, we measured power consumption at the wall while benchmarking Metro Last Light Redux, then averaged the values across the benchmark run to produce an average power consumption figure. Since raw power consumption alone isn’t all that useful, we also give data in terms of watts per frame — how many watts of power does it take to generate each frame of animation?
For this specific test, we’ve also included data on the AMD R9 Nano. While the Nano wasn’t benchmarked for the rest of this review (its $500 price point means you could buy two RX 480’s for the price of one Nano), it’s the most power-efficient GPU AMD has ever built and won accolades for giving AMD a 28nm GPU that finally closed to within spitting distance of Nvidia’s power efficiency last generation.
The RX 480 is an obvious improvement for AMD in absolute terms, given that it draws 150W less power than the R9 390, despite both of those GPUs having 8GB of RAM. The 2GB GTX 960 and R9 270X draw less absolute power, but the GTX 970 draws more, despite having just 4GB of RAM (3.5GB + 512MB). The R9 380 and even the R9 Nano also draw more power than AMD’s new 14nm chip.
Let’s see what happens when we factor in performance.
Power consumption and efficiency graphs don’t normally make for exciting reading, but I think this one is rather fascinating. First, it shows that the RX 480 has made huge improvements to GCN’s power efficiency — the RX 480 uses just 57% as much power as the R9 270X per frame of output. The R9 390 is the best comparison point as far as total RAM loadout, since both cards have 8GB, and the RX 480 still compares extremely well with that GPU.
Second, this data illustrates how lower-end GPUs aren’t always the most power efficient parts. The GTX 960 has the lowest absolute power draw, but it consumes significantly more power per frame than the GTX 970. Each chip has a sweet spot when it comes to balancing power consumption against overall performance, and the GTX 960 clearly isn’t as efficient when it comes to turning power into frame rate, even if its total power consumption is the lowest in our data.
Third, the RX 480 may be more efficient than any GDDR5-equipped GCN part, but it’s not the most efficient GPU AMD has ever shipped. That award still goes to the AMD Radeon R9 Nano, the last-generation $500 card with 4GB of HBM. If you think about it, however, this makes sense. Nano was never a performance play; it was a card designed to pack as much GPU as possible into an extremely small form factor. The 28nm chips AMD used for the Nano were the best of the Fury X parts, and the chip itself runs at lower frequencies and voltages. (Nano’s raw frame rate score of 61 FPS is part of the reason why its watts-per-frame rating is so low).
I kept the Nano comparison in this graph because it gives us the opportunity to evaluate some of the claims AMD made about power consumption and GDDR5 last year when it launched the HBM-equipped Fury family. The RX 480’s GPU is clearly more efficient than anything AMD has previous shipped, but using 8GB of 8Gbps GDDR5 clearly cost the company some overall power efficiency.
One thing this graph highlights extremely well is that GPU power efficiency really has to be compared between specific cards, not overall architectures. The GTX 960-equipped system uses 1.21x more power than the GTX 970 per frame of animation. The R9 270X uses 1.23x more power than the R9 380 and 1.91x more power than the R9 Nano, despite the fact that all of these cards are built on 28nm and based on the same GCN architecture with only modest differences between each generation.
The immediate takeaway from this data is that AMD’s RX 480 only manages to match the GTX 970, rather than surpass it, but I’m not sure that’s the right way to characterize the situation. The GTX 970 has half the RAM and clocks it significantly lower — an apples-to-apples comparison between the two ASICs would almost certainly open at least a modest lead for AMD on this front.
Ever since it launched GCN, AMD has struggled with the architecture’s power consumption. This wasn’t so much an issue against Nvidia’s Kepler, but Maxwell’s 28nm architecture demonstrated how superior power consumption could lead to superior GPUs. Cooler operating temperatures and lower power envelopes gave Nvidia more headroom to push its chip farther, while AMD struggled to match.
Fiji and Fury X were proof of this. The switch to High Bandwidth Memory (HBM) saved AMD enough power that it could build an enormous GPU without an unsustainably high TDP, while Nano demonstrated the flexibility of an HBM-equipped form factor — but both chips were still based on an architecture that drew a great deal of power.
The RX 480, in contrast, is much more efficient than any design AMD has previously shipped. Equally as important, it dramatically improves its overall efficiency without compromising on performance. The RX 480 beats the R9 380 in every single test, it wins or ties a majority of its GTX 970 match-ups, and it’s within spitting distance of the R9 390 in three of our seven benchmarks.
Whether or not the RX 480 is a worthy upgrade will, as always, depend on when you last upgraded and what features and games you care about. If you’re looking for a GPU that’ll handle 4K smoothly, both now and in the future, a $200 – $250 GPU isn’t going to cut it yet. It’s not yet clear if the 8GB version of the RX 480 will prove to be a better buy than the 4GB variant — while it’s true that games tend to use more VRAM than they used to, no game we’re aware of pushes a 4GB frame buffer in 1080p.
Assuming that the 4GB version of this card is only slightly slower than the 8GB, the $200 price tag makes it a very nice upgrade from any previous AMD card, particularly the R9 270X or HD 6870. If you’ve held off the last few cycles because you wanted to really lock in something significant, you don’t need to worry about waiting any longer. The RX 480 also competes extremely well against the GTX 970, since that card only offers a 3.5GB effective frame buffer. Between a last-gen Nvidia GTX 970 or a 4GB variant of the GTX 960 and the RX 480, we’ll take the RX 480 every time.
The really hard call here is how to rate the R9 390. This GPU starts at $279 ($259 after rebate) on Newegg, packs the same 8GB of RAM as the RX 480, and offers equal or higher performance. Power efficiency, however, isn’t all that good — the 150W difference in power consumption across eight hours of gaming per day at 12 cents per KW works out to roughly $51 a year in power consumption costs. If you upgrade every two years, it works out to about $4.32 a month for using the R9 390 as opposed to the RX 480.
If you already own an R9 290 / 290X or anything from AMD’s Fury family, the RX 480 isn’t a GPU you’ll want to upgrade to. These higher end cards won’t be replaced until later this year, when AMD launches its own Vega top-end architecture. What AMD has done, however, is demonstrate that it can build an extremely capable and power-efficient chip that meaningfully improves its position in the market.
On the whole, AMD’s RX 480 delivers what the company promised: a significant leap forward for its GCN products and a top-notch card in the $200 to $250 range.