Phil Rhodes examines the nature of AI, how it will affect video production, and whether it really is the solution to all our problems.
There's a lot of excitement at the moment about the ability of artificial intelligence, most often expressed through a neural network, to bend some of the rules of technology and information that were previously considered inviolable. For instance, we know there's no completely reliable way to automatically produce mattes of objects in scenes. Although a human can, when working hard enough with a graphics tablet, produce an effective rotoscope. The purpose of AI is to take some of that human ability, that knowledge of how the world behaves, based on a lifetime of experience, and give it to a computer.
It's important that we're aware of how early we are in the development of this, especially given the enthusiasm of companies who'd very much like us to believe that AI will allow their imminent releases to enjoy hitherto unimagined capabilities. Recently we've looked at an algorithm capable of recolouring images that, well, occasionally does a good job. That's impressive enough, but it's still far from a push-button solution to all the world's monochrome photos, let alone monochrome full-motion video.
The thing to understand about AI, however, is that it potentially allows us to break, or to at least bend, one of the key rules of information theory. Think of a low-res photo or something shot slightly out of focus. We're not talking about the de-blur features of software such as Photoshop which can work shockingly well — if the blur was caused by motion and if that motion had two components that were at a conveniently large angle to one another. We're talking about an image which completely lacks high-frequency information, perhaps due to poor focus or just an inadequate number of pixels.
In this case, the information simply isn't there. It might be a picture of a tree and we might, as humans with experience of trees, have a pretty good idea that the mottled texture at the top probably represents leaves, but that information is missing from the photograph and it cannot be recovered. What can happen, however, is that a reasonable facsimile of that information can be estimated. Information theory is still satisfied because we aren't recovering information from the low-res image. We're estimating it — interpolating if you like — based on lots of other pictures of trees. The result is unlikely to be completely identical to what the full-resolution photograph might have looked like, but that difference might be completely undetectable by a human observer.
Right now, if you want to do that — if you want your out-of-focus photo touched up to look sharp — that's going to require a human artist and a copy of Photoshop, the products of the Wacom corporation, and a lot of time and money. There is, however, nothing, in principle, stopping an artificial intelligence, perhaps in the form of a neural network, from doing that work. Get it good enough and the result is as close to a tool that automatically increases resolution as the laws of reality will permit.
To reiterate: this is currently fantasy, but it is not formally impossible to do. It is an example of something that's barely starting to happen, particularly in cell phones, where the massive market allows for a lot of time to be spent on R&D. Not that many cell phone apps are currently claiming the influence of artificial intelligence, but the general concept of taking a less-than-ideal image and processing it into a closer-to-ideal image is one that's becoming more popular. An example is in the depth-of-field tricks that have recently become available to many cell phone users in which the user slides the phone upward during an extended exposure. The result is a series of scene views from slightly different positions that allows the software to estimate depth and apply an appropriate blur.
It isn't perfect. It tends to look a little rough around the edges of foreground objects in particular and most of the work is done using dual-pixel stereographic depth sensing techniques, as opposed to with AI. Google's “lens blur” filtering does have some contribution from AI to create a mask of things which are likely to be people in order to keep them sharp, though most of what's being done uses conventional stereoscopic techniques. What's important is that many of the things that we might choose to do in post, things like relighting, fog, depth and focus control, are reliant on the hugely labour-intensive task of rotoscoping. Humans have to do rotoscoping because, at present, only humans have an understanding of the world that's sufficiently good to know which parts of an image represent a person wearing camouflage-pattern clothing and which parts are actually a pot plant. Google's technique is a very, very early hint at something which might help to solve the problem.
There's even been talk of applying AI to writing software, or, to put it another way, to creating self-writing software. Terminator-style nightmare scenarios aside, there are a few problems in software engineering which could potentially be approached in this way. Particularly, the issue of splitting workloads between multiple processor cores has become crucial since the per-core speed of CPUs has essentially stagnated in recent years. Writing multi-threaded code to make the best use of multiple cores has relied on a lot of manual work by software engineers, often using locking techniques to deliberately stall a thread while a second thread does work that the first thread needs. This is inefficient, but some fundamental theoretical issues make it very difficult to offer much automated assistance.
The difficulty with all this is that there is no way to prove that an AI has done a good job, other than by having a real intelligence (a human) evaluate the results. Do a mathematical calculation on an image and the result is evidently correct or incorrect. Do some work on it with an AI and there's no unequivocal way of evaluating the result. This is a particular issue with the idea of AI writing software, given that there's no way to formally prove that software is correct in any case (that fact being the bane of compiler engineering.) So far, AI has been applied most successfully to things like natural language handling and other similarly fuzzy situations where a degree of error is expected and tolerable. That's not the case in software, which is expected (theoretically) never to err.
Other problems seem more solvable. Most attempts at AI image processing to date involve stills. Process one frame with an AI and the adjacent frames need to have sufficient continuity with it to maintain the illusion of real-world, live-action photography. Expand that to adjacent shots, which need to have similar continuity and to an entire film where we might return to a location and expect to see the same tree, and the problem becomes complicated, though again, not theoretically impossible. The issue is not so much whether an AI could deal with that situation. The issue is when that'll become possible and how much it'll cost, and whether it's worth just going back and shooting the tree again, this time in focus.
Ultimately, the idea of simply feeding a script into an algorithm and having a finished motion picture master drop out the other end of the machine has been mooted. If that's feasible (and it probably is), we're decades away, even if people actually want that to happen — and we might assume there'll be some objection to it. At the end of the day, people don't make movies because it'll make them rich, because in the overwhelming majority of cases it won't (a few extremely high profile exceptions notwithstanding.) People make movies because it tickles them. Similarly, there's no real need for actors to perform the same play six nights a week. We could just film it. We do these things because there's arbitrary value in the human endeavour involved and that feeling has been shown to be somewhat resistant to exactly the sort of technological change we're discussing.
Perhaps the best of what AI could do is take some of the grunt work out of filmmaking — the tedium of rotoscoping, the disaster of a focus buzz in the middle of that perfect take. We're some way — certainly many years, decades probably — from having truly push-button solutions to those things, but the ability to bend the rules of information theory is likely to make AI an increasingly important part of modern computing.