Along with resolution, format, dynamic range and aspect ratio, there will soon be another important piece of metadata to append to our video and still image files. It's video Authenticity.
I know there is work being done in this area, but I want to talk about it in the context of AI and how that might bring a new urgency to the question of how we can know that what we're seeing is genuine.
A few years ago - and significantly on the first of April - I saw a press release saying that an IT company was proposing a new protocol standard. The new protocol would reserve certain bits in an ethernet frame to signify whether the data contained in it could be trusted. In a secure environment, only data carrying this flag would be allowed in. For about a microsecond, it seemed to me like a good idea - and a remarkably simple one at that. And then I realised that not only is it the easiest thing in the world to flip a consistently placed bit in any data stream, but this is indeed what any malevolent agent would do, immediately and without hesitation. What could be better for fraudsters than to be able to certify all their communications and transactions as honest?
It also brings to mind a very analogue truth that if someone feels the need to call themselves "Honest John", it is more than likely that they're not honest. In the same way, if you have to try to be nice, you're probably not a nice person intrinsically.
So merely flagging content as authentic is not going to do much, apart from fostering a false and somewhat dangerous sense of security amongst the gullible.
Why is this becoming important right now? I think it's because there are more and more ways to "improve" images. Simon Wyndham, RedShark's Editor, has shown us several examples recently.
It's only to be expected that there will be debates about the Authenticity of a colourised hundred-year-old film. For what it's worth, I think the answer to this depends on the context of the question. Is a colour fabrication of an original black and white documentary film about the first world war more engaging than the monochrome original? Almost certainly. More relatable? Definitely. More accurate in a court case for some reason disputing the colour of someone's coat? Absolutely not. Truer as a historical record? That is genuinely debatable.
This latter question throws the whole issue into sharp relief.
For a start, you have to ask: how was the colourisation done? If it was by an AI trained on modern-day objects and fashions, then it's going to be little better than guesswork. But what if the AI was trained by an expert in historical military uniforms and by an authority on army equipment of the period? In these latter cases, surely it's probably that the results would be at least as reliable as an account of the period in a well-regarded history book? Moreover, if we can trust words like "Red", "Blue", and "Khaki", then we can trust an AI trained on the same semantics.
There will be questions about compression codecs too. Just like there have been in the past. When long GOP compression started to become commonplace around the start of the century, police forces began to get pushback from legal authorities. The issue was that if "some of the frames are made up out of nothing", then any video encoded with this technique could not be relied on for evidence.
I think this has since resolved itself because if you're going to question this technique, you would have to find examples where it could be genuinely misleading to a judge or a jury. About the only cases I can think of - and this would never happen in the real world - would be footage of a bullet that spontaneously and against all known laws of physics changed direction in between keyframes. But, of course, the same objection could apply to there being space between actual frames themselves. Granted, this can cause confusion due to temporal aliasing - the "waggon wheel" effect; but if you were looking for evidence that was that specific, you'd be using other types of test equipment.
So while there's a precedent for legal reluctance to rely on video evidence that's been compressed in one way or another, I think it's essentially missing the wood for the trees, as opposed to missing the crime for the compression. But will these objections still be valid when we start to use AI to compress video?
AI is already being used to compress video and doing a remarkable job with it. For two years now, Samsung 8K televisions have used AI to upscale 4K and HD pictures to 8K. I had an 8K Samsung TV in my living room for a while, and to my eyes, it worked. What I mean by "worked" is that I definitely saw better than HD or 4K pictures on the 8K display. I could not see any apparent artefacts or inaccuracies, nor did I see any evidence sign that the AI had "invented" stuff that wasn't there in the original. That was nearly two years ago, and a lot has happened since then in the field of AI.
Think about this: every time AI is used to interpret or improve video or "guess" what should be there from what's around it, it could also be compressing the data in the picture to give a similar perceived quality. It would do this by reducing the images to a set of what I call "conceptual vectors" that signify the essence of the image. I think this could work very well, and we are already starting to see AI-assisted compression lowering the bandwidth needed for video calls while improving the quality, too (and, in the case of Nvidia's efforts, modifying the image so that a video caller's eyes are actually looking at the camera!).
As AI encoded video gets more sophisticated, it will be able to interpret more and more about a scene. For example, it will be possible to infer a 3D model of the scene from a single camera view.
Let's say the AI knows that it's looking at a 2021 Ford Mustang Mach E, and because it knows this, it can build the model with a blend of what the camera sees and from its own recalled information.
How authentic is this? In the sense of it being an accurate Mustang Mach E, almost perfectly accurate. From the plaintiff's point of view, in a legal action against a defendant who is accused of driving into the back of his or her car, it's almost entirely useless - inauthentic, in other words. That's because the video AI video recording will completely miss the damage at the back of the car.
I would argue that it's no different to what would happen with a regular camera that's simply shooting the wrong part of the car. But what does matter is that an immersive 3D model of the vehicle, generated by AI and presented as evidence, might mislead the judge and lead to the wrong outcome in the case.
All of this means that it is essential to flag the parts of the video that are AI-generated so that they are not used as evidence.
At which point, I refer you to the beginning of this article, which states that any such authenticity flag will be the first thing that hackers and fraudsters tamper with and take advantage of.
I think AI is already a fantastic tool for video enhancement, and I have no doubt that AI codecs will bring us even better video. What I'm not sure about is how long it will take us to develop a culture around the Authenticity of video. It's already a serious issue.
Last week, we started to see reports that Nvidia's Keynote speech earlier in the year, presented by CEO Jensen Huang, was generated by an AI simulation of him. Lots of people believed this, with varying degrees of either amazement or indifference. But, of course, the reality was a little different. Only a 14-second Tron-type fantasy sequence was computer-generated. But what matters here is that most people believed that the entire two-hour presentation was the product of AI.
That means that we are already predisposed to believe that AI can achieve results way beyond what are actually its current capabilities. Equally, sometimes we are genuinely fooled by AI-generated video. There's a massive disconnect. We already can't tell fact from fantasy, and I humbly suggest that it's a problem that goes way beyond the world of video.