Recent developments point to how ARM processors will develop over the next decade or so. Here's what we have to look forward to.
Since ARM announced the 64-bit ARMv8 ten years ago, ARM has been gaining a lot of traction. While its largest market by far is still low power embedded devices, it's now available in full-fledged personal computers, datacenters, and even in supercomputers – including the current top dog in the field, Japan's Fukagu built around Fuji's monster AF64FX.
In the x86 world, to increase the width of the vector extensions. Intel and AMD have to introduce new architectures. The new Willow Cove architecture for example now supports 512-bit vectors, allowing for eight 64-bit scalars on a single vector which can deliver a huge boost for applications that can use vectors that wide, but also requires new binaries.
ARM's SVE allows code to use vectors of any size, in 128 bit increments, with a single binary. That means that an application compiled with SVE works just fine with 128-bit vectors, yet can take advantage 256-bit or 512-bit vectors on another processor, such as the AF64FX. The maximum vector size is a huge 2048 bits, so there's quite a bit of room to grow, but the binary compatibility goes in both directions, so code written for an HPC processor can run on a low-power IoT device also.
One down side to the original SVE is that the instructions were optimized for HPC, and as such not well suited to consumer applications like signal processing. SVE2 expands the instruction scope to include instructions for signal processing.
Armv9 includes tensor instructions designed to accelerate machine learning applications in a bid to expand its share of the rapidly growing machine learning market. Though tensor instructions have been available for ARM as an extension, now they are part of the core architecture.
The biggest suite of new features for Armv9 are dedicated to security, dubbed the “Confidential Compute Architecture.” Armv9 supplements the Memory Taggin Extensions which help prevent exploits based on buffer overflows and Realms provide secure address spaces. An OS running within a realm can't access any memory outside of that realm, somewhat like a Hypervisor with stricter security.
ARM also revealed that it's working on new features like Variable Rate Shading and Ray Tracing for its upcoming Mali GPU core.
ARM outlined plans for continuing to improve generational IPC, through a combination of architecture improvements, manufacturing improvements, and faster memory.
Though the generational increases in performance look to be slowing down in future generations, most likely because the last few iterations have already captured the low hanging performance fruit, the pace of performance increases still promises to be impressive, almost certainly enough to keep AMD on its toes even if Intel doesn't get its act together.
On the mobile side, the first Armv9 based Cortex and X1 cores will probably be released later this year.
Meanwhile at GTC, nVidia unveiled its first Neoverse based Grace processor for datacenters, due in 2023. Grace will be replacing the EPYC processors nVidia uses currently in its A100 GPU series. This gives nVidia a way to bring the CPU portion of the GPU in house, and also replace the PCI Express link with a much faster NVLink4 interface between the CPU and GPU.
Although the architectural changes in Armv9 don't seem that significant on the surface, the future of ARM is very bright.