Back when Nvidia unveiled its second-generation Maxwell architecture, it delivered a huge improvement in performance-per-watt, all while keeping the same 28nm process node that powered its previous GPU architecture, codenamed Kepler. While the company shared some details on how Maxwell improved on Kepler, including a larger L2 cache, improved memory efficiency and a new Streaming Multiprocessor (SM) configuration, it played a number of cards extremely close to its chest.
Thanks to some investigative work by David Kanter of RealWorldTech, we now know one of the critical details that Nvidia didn’t disclose. Unlike Kepler, or the various GPUs manufactured by Intel and AMD, Maxwell and the current-generation Pascal both use tiled rendering. The alternative — immediate-mode rendering — has been the industry standard for years. The full video and explanation of the tools that Kanter uses is embedded below, but we’ll discuss the findings and their implications.
Those of you who have followed the graphics industry for the past 15 years may recall a GPU that emerged, for a brief time, as a potential challenger to ATI and Nvidia. The Kyro II was a tile-based rendering solution built by PowerVR that won some fame and market share as a potent low-cost solution. Unlike immediate-mode renderers, which draw the entire screen space left-to-right and top-to-bottom, tile-based renderers break a scene up into a tiled grid.
By breaking a scene into tiles, the GPU can work on each tile individually, rather than attempting to render the entire scene at once. The other advantage of tile-based rendering is that the GPU can test to see whether pixels will be visible in the final image before it textures and shades them. The classic distinction between tiled renderers and immediate-mode renderers is that IMRs suffer from some degree of overdraw, meaning that they spend time shading, lighting, and drawing pixels that are then thrown away without ever being shown to the end user. There’s a memory bandwidth and power cost to this, and apparently a significant chunk of Maxwell’s heralded efficiency came from adopting a tiled approach.
By using targeted tests, Kanter was able to show how AMD and NV GPUs differ. The image above is from an AMD GPU using standard immediate mode rendering. The test application draws triangles across the screen — and what we see above is one triangle replacing another, top-to-bottom, right-to-left.
Here’s an example of the same test running on a Maxwell GPU. Instead of a contiguous surface, we see a series of blocks — tiles — across the screen. Kanter spends time stepping through various test scenarios, illustrating how the size of the tiles shrinks as the amount of information within each tile grows. The current theory is that Maxwell and Pascal dynamically adjust tile size depending on how much work needs to be done in each tile. This keeps the amount of information that needs to be stored about each completed tile within whatever buffer or cache limits Nvidia has set.
There’s still much we don’t know about Nvidia’s solution. For starters, the typical tile-based renderers from Imagination Technologies are deferred renderers, while Maxwell and Pascal use a tile-based immediate-mode renderer. Exactly how this impacts power or GPU efficiency is unknown.
There’s no way to know exactly how much of Nvidia’s power savings in Maxwell were specifically the result of this new rendering method, but we’d bet it’s a non-trivial part of the equation. Nvidia has played this particular card tightly because switching from an IMR to a tile-based renderer gave it a significant advantage over AMD — an advantage the company wouldn’t want its main competitor to ferret out.
Then again, major tech ideas have a way of making their way to products from both companies, and tile-based rendering isn’t a new idea — it’s only unusual to see it in desktop hardware. Companies like PowerVR have been building tile-based rendering solutions for many years. We’ll have to wait for Vega to find out what AMD is planning, but the company’s first new architecture in five years is expected to debut at the end of this year or in early 2017. We’ll have to wait and see if AMD leaps for a similar design or has some other tricks up its sleeve.