A new paper published by Adobe reveals details of a hugely impressive new technique that can upscale video by up to 8x.
There is, as you might well imagine, a new generative AI at the heart of this. The entertainingly named VideoGigaGAN was introduced in a paper Adobe published a couple of weeks ago and that you can find here. Essentially, the company claims it provides a better way of uprezzing video than previous Video Super-Resolution models have managed because it can produce videos with high-frequency details and temporal consistency.
Uprezzing images has become commonplace. Here at RedShark we typically run our header pictures at 1280x720 (1920x1080 looks a bit better, but takes up that much more memory, so we decided on the trade-off - though that might change as we move to WebP). I cannot remember the last time I was worried about upconverting a lower resolution image: just resize in Pixelmator Pro and its AI tool produces an eminently usable image, though we tend not to push it more than 2x without at least some consequences.
Adobe’s VideoGigaGAN showcases a reliable 8x increase on video, which is very impressive. Key to its success is the ability to solve the problem of temporal inconsistency, essentially stopping the introduction of the sort of AI hallucinations that result in six-fingered hands and the like.
For those that want to know more, it offers this: “To enforce temporal consistency, we first inflate the image upsampler into a video upsampler by adding temporal attention layers into the decoder blocks. We also enhance consistency by incorporating the features from the flow-guided propagation module. To suppress aliasing artifacts, we use Anti-aliasing block in the downsampling layers of the encoder. Lastly, we directly shuttle the high frequency features via skip connection to the decoder layers to compensate for the loss of details in the BlurPool process.”
Adobe is, of course, not the only company working on this. As The Verge points out, it showed Project Res-Up, during its MAX event back in October 2023, which does the same thing but in a different way. And both Microsoft and Nvidia have also developed their own VSR upscaling technology.
So far it’s important to remember that this is a research paper, not a product preview. The researchers admit that the model encounters challenges "when processing extremely long videos," which it then defines as a meagre 200 frames or more. But it points at some potentially interesting things in the future, especially as networks struggle to cope with the demands that increases in video traffic place on them. Codecs are one way through this, but if you can send a SD video and have it confidently uprezzed to 4K at the other end too, you’ll save a lot of data.