Nvidia's recent demontration of AI video manipulation gives us a glimpse of the future. Will it ever be perfect? Phil Rhodes looks further.
When we talked about AI a few weeks ago, we discussed the theoretical possibility of generating entire feature films. It's still up in the air as to whether that's even theoretically possible. Would the required neural network be larger than the observable universe, or is there some smart way to optimise things? Either way, Nvidia has now released some video footage that's being widely billed as AI-generated.
Which it isn't. It's actually AI-altered, inasmuch as the input data shows a dashboard camera's view of a trip along a road and the output data shows an approximation of that same trip along the same road under different conditions – turning day into sort of almost night, or a snowy road into a summer scene. The reason we're being cautious about how we describe these results is that they're some way from being perfectly convincing. Individual frames are very persuasive and the overall effect is pretty reasonable, but the devil is in the details, and there are quite a few problems which would make someone wonder what was wrong with the video in question.
Perhaps we'll all have to become good at spotting the slightly glitchy results of AI manipulation. Perhaps being aware of that sort of thing will become a part of that necessary resistance to trickery that everyone needs to safely negotiate life. On a more technical level, the glitches are more or less what you'd expect: the AI doesn't seem to have a very good understanding of the fact that if a street light exists in frames 1-10, it's not a great idea to have it fade out to nothing during frames 11-20 (video below.) We might hazard a guess that it doesn't really understand what a street light is. It just understands that sometimes night videos of roadways have orange dots in them and those orange dots tend to move in a certain way.
To return to the issue of sheer computational horsepower, though, there are two things to mention: first, Nvidia's involvement makes a certain amount of sense inasmuch as neural networks are a good target for GPU processing resources and the company is keen to sell AI on the basis that AI is likely to sell GPUs. And that's not an idle thought: Intel seems sufficiently concerned about this that it announced their Lake Crest neural network processing hardware back in October, a product of Intel's purchase of AI startup Nervana. Exactly what the Nervana technology will be used for remains to be seen. If anything's clear, though, it's that if there's to be an AI boom, we're probably in the early stages of it.
We should be cautious, of course. The demos we're seeing at the moment, which include synthesis of all kinds of things, from images to speech to video, are persuasive. It's as well to be clear that most of them are not ready for the big time – if “the big time” means the results have to be completely convincing to humans. The speech synthesis results seem competitive with existing techniques, but the faces can be alarmingly misshapen and the video doesn't look completely correct. There are some very good clouds (of which more in the future) but overall we shouldn't get too excited. Nonetheless, the quality of results is improving, and assuming there aren't any fundamental limitations (which there may be) it's looking pretty good.