nVidia’s powerful Lovelace architecture has a battle royale on its hands in the shape of AMD’s exciting new Navi3.
In the last few weeks, both nVidia and AMD officially unveiled their new GPUs, reinvigorating the CPU/GPU war that's been a bit subdued lately due to the chip shortages, shipping delays, and general industry malaise due to the general global malaise going on.
But in spite of everything, engineering proceeds apace.
Nvidia's Lovelace GPU is in pretty much every sense of the word a monster. And a bit of a monstrosity as well, due to being a single die. The new architecture boasts some impressive improvements across the board, including the fastest memory currently available, huge enhancements to its ray tracing, compute, and tensor performance, and a record-setting power budget. The flagship 4090 model doubles the already insane performance of its predecessor the 3090, though due to improvements in efficiency, only bloats the already insane power budget by another 30% or so.
It's only been available for one month, and already there are YouTube videos showing 4090s melting down because of the enormous power draw.
One question became pretty obvious at that point: why did nVidia push the power envelope so far when it's already leading the pack in performance?
Last week we found out why: because AMD's Radeon Technologies Group isn't holding back any more.
As one might recall, when Lisa Su took over AMD, she made the strategic decision to focus the Radeon Technologies Group primarily on gaming consoles while it divested itself of Global Foundries, shifted its production to TSMC, and built several design teams to work in parallel on Zen CPU architectures.
As a result, the Vega GPUs stagnated for a while, though AMD even then had ambitious plans. Since both Microsoft and Sony wanted hardware ray tracing in their next generation gaming consoles, those contracts enabled AMD to develop its ray tracing hardware. Navi2 featured AMD's first commodity hardware ray tracing, and by sharing engineering knowledge between the CPU and GPU design teams, AMD succeeded in exceeding expectations for the clock speeds of its GPUs. The actual number didn't seem like much, just a few hundred Hz, but when accounting for the massive paralellism that is part and parcel in a GPU, that added up to a huge boost in computing power.
AMD has been sharing another big feature of its CPUs with the GPU teams: chiplets. AMD has been using chiplets in its CPUs since the beginning of the Zen line, and it's been honing that technology ever since.
By using chiplets, AMD scored several wins for itself in Navi3.
The big one is that in spite of a gigantic number of transistors, the individual chiplets are still relatively small, which allows for greater economies of scale and fewer chiplets lost to wafer defects. While that might not sound big, it's actually a huge advantage: AMD is charging $600 USD less for its flagship than nVidia is... because it can.
Another big win is that AMD can mix and match process technologies. Where nVidia, using a single giant monolith, has to use only TSMC's latest and greatest and therefore most expensive and lowest yielding process, AMD is using 5nm for the compute chiplet (Graphics Compute Die, or GCD) and a much less expensive and mature 6nm process for the memory caches dies (Memory Cache Die, or MCD).
The RDNA flagship, the Radeon X7900 XTX, will feature a maximum of 24GB of GDDR6 memory and 61 TFLOPs of GPU compute throughput, a 54% improvement in performance per watt. The Infinity Cache has also improved, now with peak bandwidth of an impressive 5.3 TB/sec. The compute units in Navi3 have been redesigned to improve density, which also makes for smaller dies, even though AMD also doubled the instruction issue rate and added more floating point, integer, AND AI operations – including two dedicated AI accelerators per compute unit. Ray tracing performance similary has increased by 50% per compute unit, which when combined with the increase in the number of compute units leads to a pretty significant improvement in ray tracing performance.
The RDNA3 media and display engines have also gotten some serious attention. The Radeon 7000 series now support DisplayPort 2.1, and include hardware encoding and decoding for AV1 at up to 8K at 60fps, in addition to the usual H.264 and H.265 suspects.
The only area where AMD hasn't pushed the envelope is on the PCIe side; it's still limited to PCIe 4.0 instead of 5.0, even though Zen4 is already using PCIe 5.0.
Overall, AMD is estimating an increase of 1.5x to 1.7x that of RDNA2, which while still trailing nVidia's Lovelace line, it isn't by much.
The power draw for the 7900 XTX is 355 watts, quite a bit more manageable than the nVidia 4090's 450 watts.
There are still a few unknowns though, mainly relating to professional content creation software. AMD's presentation mentioned some professional software like OBS and Handbrake, specifically calling out RDNA3 media engine support, but didn't mention other professional content creation software like Resolve and Blender or GPU compute heavy plugins like the suites from Red Giant and BorisFX. We don't know whether or not applications like Resolve or Scratch will be able to use the new AI engines to drive its Neural Engine features, or whether renderers like Cycles and Karma can use those AI engines for GPU accelerated noise reduction while rendering.
While the monster 4090 will most likely outperform even the 79000 XTX, the higher cost both for the card itself and for the power supply upgrade will likely give AMD an edge. For mobile workstations, native support from 3rd party content creation software will be critical, and currently unknown. Expect a lot of mobile workstations with AMD CPUs and nVidia or Intel GPUs in the near future, since Intel and nVidia both have a head start on getting 3rd parties on board.
The 7900 XT and XTX are due in December. It's going to be an exciting holiday season for gamers.