One day — soon — it may be possible to take almost perfect photos and videos with less-than-perfect lenses
Digital processing has been around for a long time. Photoshop was first released in 1988, but even before then, the mathematics to blur and enhance video was known for decades. Now, we're starting to use the abundance of digital processing that we have available to us today to correct images from sub-par lenses, and even to optimise - and almost perfect - lenses that are very good indeed.
In fact, some industry insiders are now saying that built-in lenses can compete with removable ones on the grounds that the correction built into the cameras is based on a complete knowledge of the lens, and this is something that you simply can't do if you allow any lens to be used, because being able to treat the lens and sensor combination as a "closed" system is always going to give you the best chance of correcting lens defects digitally.
Incredibly, some of the digital processing techniques that are in use in today's studios and edit suites were invented in the 1920s and 30s. They were only theoretical then, and it was only in the 60s and 70s, in forward-looking research establishments like IRCAM (Institut de Recherche et Coordination Acoustique/Musique) that computer music composition programs started to produce real results - even if it took a week of number-crunching on those early computers to generate a few seconds of synthesised sound.
Impossible
The rate of progress is truly incredible and - just as predicted by Raymond Kurzweil in is seminal book "The Singularity is Near", we are starting to see things happen that just a few years ago would have seemed impossible.
The slide rule was in use for over three hundred years until Hewlett Packard released the HP35 scientific calculator in the mid seventies.
What has happened since then? This: our ability to calculate with a hand-held device has increased in the space of only forty-one years by several hundred million times.
The slide rule is just a single poignant example of the kind of change that surrounds us. There are dozens of other instances of this type of off-the-scale rate of progress.
So, what examples are there of things that were supposed to be impossible? And where is all this taking us? How about this for example (it's software called Melodyne that "unmixes" music). And of course there's Photoshop's Content Aware Fill, which somehow "generates" new material to fit in - often seamlessly - where an image has been stretched or an object removed.
Hard to take in
In digital signal processing (that means audio as well as video) the amount of sheer computing power available is hard to take in. What would have taken a room full of servers ten years ago, and would simply have been impossible twenty years ago, is now available in a portable device that you can hold in the palm of your hand. Companies like Altera and Xilinx make chips with over three billion transistors in them. My family's first-ever colour TV, back in the seventies, had only 63 transistors (I counted them, in the service manual!).
So, if you can build a colour TV with only 63 transistors, think what you can do with forty-seven million times that number!
And that's ignoring the advances in software and communications as well.
That's enough processing power to take an incoming SDI feed of 1080 video and compress it into ProRes and store it on an SSD, in real-time. There are even bigger processors than that now and the trend shows no signs of slowing down. In fact, as predicted by Kurzweil, it's speeding up.
And now, we are starting to see things that really did look like magic a few years ago, and one of them is the subject of this article: digitally correcting lens aberrations so that you end up with a better image that you could expect through optics alone.
So, what does this mean, exactly?
What it means is that you can take a lens, digitise the images it creates, and apply mathematics to remove distortions, chromatic aberration and blur. If you take away these shortcomings, you end up with a perfect picture - subject only to the precision of the mathematics, the accuracy of its assumptions, and its "knowledge" of the lens.
Now, there may be some of you that aren't surprised by this because we've been using software to apply lens correction for a long time. Perhaps the best example of this is with anamorphic lenses, where the lens deliberately squeezes the image horizontally to fit onto a narrower film or sensor, and then, on playback, the image is stretched again by the corresponding amount, so that the proportions in the image are correct with respect to the original.
Of course, the anamorphic technique was originally an optical one, but with digital video, it becomes the subject of mathematics. And it only takes a pretty simple sum to convert a square pixel into a rectangular one.
Nothing magical
But there's nothing magical at all about this, and the reason it's so easy to understand how it works is because when the anamorphic lens introduces the distortion in the first place, it does it by a very accurately known amount. All you have to do to "correct" it is to apply that amount in reverse. Very little information is thrown away in the process (apart from some horizontal resolution).
Similarly, it's easy to correct certain lens distortions digitally, where the image itself contains a reference that describes the extent of the unwanted change. An example of this would be where a wide-angle lens has distorted the shape of a building, or made it lean back, instead of looking upright. The key point here is that you know the building's supposed to be straight, so you can highlight a line that is supposed to pointing directly upwards, and tell the software "make it vertical".
It's where there's no such reference that it becomes really difficult, not to say impossible, until now.
Throwing away information
But there's a difference when lenses throw away information, which is what they do when they cause blur and, for example, chromatic aberration. The information isn't in there to get back. It's roughly the same problem as when you're mixing audio tracks together: you can't unmix them. Even though our ears can "separate" the singer from the string pad, the old adage that "noise is always additive" means that you can't just feed your mixed stereo pair back into a mixing desk and unmix it (although, strictly, now you can...).
Music, though, isn't just noise. It has a shape and form, which at the very least gives clues as to how it would have sounded unmixed. That's what our brains do, and there's no real reason why computers shouldn't do it either.
It's often easier to talk about audio processing than the video equivalent because it is more "linear". Once you understand an audio process it's relatively easy to apply it to video. For example, with audio, we hear higher frequencies as a higher pitch. If you put a music track through a low-pass filter (in other words, one that take out the higher frequencies and leaves the lower ones intact) them you will get the familiar effect where it sounds like it's coming through a brick wall, with the bass notes and drums booming and the higher pitches sounding decidedly muffled.
In video, this is analogous to blur. Imagine a chess board. Sharp edges between adjacent black and white squares represent the presence of high frequencies. If you put the image through a low pass filter, it will look blurred. There's less information there than you started with. You can boost what high frequencies are left, and that will sharpen the edges, but it's an artificial effect, not real. All it's saying is "make it look like there's sharpness here, based roughly on what's here already". What it's specifically not doing is burrowing into the image to retrieve information that's there but just not visible.
And then there's reverb, which is where you hear the original sound, followed by multiple reflections that are too densely packed together to be audible as separate echoes. Until now, it's been virtually impossible to remove reverb, because it's additive, and there's no way to subtract it.
But now, it is possible, and companies like iZotope are doing it. They're an audio plug-in company, and their latest generation product can remove reverb - something that's been thought to be impossible until now.
We don't know what their secret sauce is, but we're guessing that it's a technique called "reverse-convolution". And it's directly analogous to what some other people are doing with lens distortion and video - although in this latter case, research is in its early stages.
I need to preface this with what might already be obvious by now: that I'm not an expert in any of this. But in mitigation, I have been around Digital Signal Processing for a very long time, and so some knowledge of it has gone into my brain through osmosis.
It's pretty simple
The way to understand reverse convolution is to understand non-reverse convolution: convolution, in other words. It's pretty simple.
Imagine someone regularly and slowly banging a drum in a cathedral. Most large, largely empty and unfurnished buildings have a long reverb time. If you make a sudden noise, you can hear it dying away, sometimes for up to thirty seconds. What's happening is that the sound is bouncing off the complicated surfaces in the building, and being reflected a very large number of times. Each fraction of the sound goes through the same processes and its reverb merges with earlier and later fractions.
All of this adds up to an ongoing "mush" surrounding the original audio. If it's in the right proportion it can subjectively enhance the sound. If there's too much it, or if it's completely unintentional or unwanted, then it's a bad thing.
You can simulate reverb simply by running audio through a digital delay line and making sure you have enough delay taps, and that you feed the output from these taps back into the line, with the result that there are so many echoes that they merge to create a convincing reverb effect.
Sympathetic
Or, you can make reverb using convolution, which is much better, and in some ways simpler. It's certainly much more accurate and sympathetic to the original acoustic space.
The first thing you have to do is "sample" the acoustic space you're in. It's a simpler process than it sounds. All you need to do is create a "click", that is incredibly short (ideally with a duration of one audio sample - so for a 48Khz sampled recording, your "impulse" should be one forty eight thousandth of a second long). You record the pulse, and the reflections it creates. This recording represents a "snapshot" of the acoustic environment.
Then, in the convolution process - and this is a gross simplification - you add the sample values in the recording to each and every sample from the original sound. And you start again with every new sample. So every part of the recording has the "reverb" of the building added to it.
It works very well, and commercial reverb software allows you to add your own impulse responses, or use other people's. So, for example, if you wanted to know what your drumkit sounded like in the Taj Mahal, you just need to download an impulse recording from the famous Indian building, and off you go.
Reverse convolution is exactly this process done backwards. But of course it's not easy if you don't have the original impulse recording to get back to the original. You have to make assumptions, which means that the process is less than perfect.
Moving on to images, let's talk about lens distortion.
Imagine that you have a spot of light (or just a point on an image). Ideally, if you pass the light through the lens, this point will still be a point. The rest of the image might be bigger or smaller, or inverted, but, essentially, the point will still be a point.
If it's not a point, then the lens is distorting the image. The point might end up as a circle or an oval. It's likely that different parts of the lens will distort differently.
The way a lens distorts a point of light is called by the rather technical sounding name of "Point Spread Function" or PSF. And the PSF is directly analogous to the impulse response that's used to simulate audio reverb. If you can measure the PSF accurately, then you have a chance of reversing it. Applying the process of reverse convolution to images can, theoretically, take us back to the original, undistorted image.
Real world research
And that's exactly what researchers like Felix Heide, Mushfiqur Rouf, Matthias B. Hullin, Bjorn Labitzke, Wolfgang Heidrich and Andreas Kol at the Universities of British Columbia and Seigen have been trying - and largely succeding - to do.
They've managed to take a very simple lens - a single, concave piece of glass, and, attaching it to a Canon DSLR, they've succeeded in making significant and useful corrections to the distortions that, inevitably, such a simple lens produces.
You can read their paper and see their video at the end of this article, but, for now, and to avoid painful-looking mathematics, it's suffice to say that what they've demonstrated is that, given time and sufficient processing power, the future for video and photography is looking very good indeed.
So what does this mean in practice?
It means that before very long, camera phones will be able to take really stunning photos with quite mediocre lenses. As sensor resolution increases, the lenses can be "sampled" and then corrections can be applied. Overall, since processing tends towards being free over time, lenses will get cheaper and pictures will be better.
Closed system
As we said at the beginning of this article, if you treat the lens and the sensor as a closed system is always going to give you the biggest opportunity to correct distorted images digitally. You could even argue that if you want to retain the flexibility of removable lenses, then you should include the sensor with the lens. That would relegate the role of the camera itself to essentially being a mount for the lens and the viewfinder. (If you think that's unlikely, what about this? It's Sony's Lens Camera, and it's exactly what we're talking about here. Don't dismiss this as a consumerist gimmick: I think it's possible - likely even - that this could be the future of cameras of all types. Not, perhaps, mounted to a smartphone, but as part of a completely modular camera system.)
But what about high quality lenses? Like the ones made by Fujinon and Cooke and so many other precision glass manufacturers?
Well, in my view, it depends what you want. We will have, in a few years, the capability to capture almost perfect video in very high resolutions. Lenses in the truest sense will be "transparent": in other words, they won't distort incoming images at all. Thanks to mathematics, It will be as if they weren't there at all.
Do we want this?
But do we want this? The question here isn't "aren't top-end lenses good enough already" but "Do we really want to throw away all the goodness and character in a lens that has its own personality?"
This reminds me of something I saw at the recent IBC show in Amsterdam. I was talking to Cooke Lenses on their exhibition stand, and I glanced across at one of their superb lenses attached to an Arri Alexa camera, which is noted for the "filmic" look of its sensor.
What I saw in the monitor attached to the camera was incredible. The image on the screen was hard to recognise as being a mundane shot of the exhibition hall. It looked like it was from a feature film, with rich, warm and smooth colours. The character of the lens and the sympathetic handling of the sensor complimented each other perfectly and I found myself thinking that even if you could take a scientifically better image, you wouldn't want to.
Which means that even when we are able to produce the perfect picture with the help of advanced real-time mathematics and even more processing than we have today, it might be that not many people are tempted to use it.
Video and white paper on next page
White paper is here (Warning: PDF download).