RedShark Replay: Computational photography techniques are now commonplace on smartphones, allowing tiny sensors to achieve amazing results. Phil Rhodes asks, why hasn't computational cinematography in bigger cameras yet become normal?
Isn't it a bit unfair that cellphones can do such amazing things with such basic cameras? Sony's IMX586 sensor has just beaten the world record for smartphone cameras with a 48-megapixel resolution, and it's only 8mm across. In cinematography, we consider 33-megapixel (that is, roughly 8K) pictures to be where it's at, or even a bit beyond where it's at right now, and it seems like we increasingly prefer sensors bigger than super-35mm film frames to do that.
Phones seem to pull off incredible feats of sensitivity and noise that belie their fairly modest camera technology. OK, the IMX586 uses some stacked sensor technology that's fairly up to the minute, but they're still only 0.8-micrometre photosites, something like a tenth the size of those on an Alexa LF.
It's as well to bear in mind that the Alexa might absolutely have ten times the noise performance of the little Sony smartphone camera. Stops of dynamic range (or noise in decibels) are on a logarithmic scale, so if the Alexa did have ten times the noise performance, that'd only be a hair over three stops more effective dynamic range. It's entirely feasible that level of performance differentiation exists. But it doesn't instinctively feel like that. It feels like the phone is doing something clever.
As we discovered all the way back in 2015, it's now quite normal for cameras to be doing, well, something clever. Noise reduction goes without saying. Advanced tricks such as taking multiple shots and averaging them are almost as common, and massively improve low-light pictures. The question is why we haven't really seen this sort of thing begin to emerge in cinema cameras. On film and TV productions – and frankly, in almost every circumstance except cellphones – it's normal for pictures to be a fairly literal representation of what hit the sensor.
It's not as if the same sorts of ideas haven't been tried. Some modern CMOS sensors can read out the same frame twice, after different exposure periods. That allows for a “darker,” shorter-period exposure, and a “brighter,” longer-period exposure that can be combined mathematically later to produce a high dynamic range picture. That's certainly a type of computational photography. The problem is that subjects in the frame may move between the two exposures, creating strange fringing around those moving objects; cinema cameras which use that two-expsosure approach have been tried, but it was often felt that the downsides outweighed the upsides.
That is, of course, basically what a cellphone is doing when it takes several exposures for a still image. Yes, those exposures will be taken using a camera mode that's designed to capture a lot of frames as quickly as possible, to reduce fringing, but it relies on software in the camera to recognise objects from frame to frame and apply adjustments – distortions, essentially – to ensure that objects in all the frames line up accurately.
Could this be done with production cameras? Sure. Has it? Not beyond things like the more advanced sorts of noise reduction and contour-tracing approaches to Bayer demosaic processing.
We can only hypothesise why a particular company might or might not have chosen to include a particular feature. It's certainly possible to put cellphone-style features in production cameras; cellphone hardware is inexpensive in that context. Beyond that, two possibilities spring to mind.
First is simply reliability. It is possible to cause at least some cellphones to create broken, distorted images if certain changes happen during the burst of exposures used to take a single low-light photo, and many other computational techniques suffer similar problems in what a software engineer might call an edge case. It's likely this sort of thing will improve, perhaps with the assistance of AI, although a production camera has a rather lower tolerance for error than a pocket snapshot.
The second issue is less of a technical concern, and more one of business and economics. One of the reasons cellphones get these advanced techniques is that they're one of the most mass-market pieces of technology ever released. In 2017 Google announced that there were two billion “monthly active” Android devices in the world; IDC suggested that 1.24 billion of them were shipped in that year alone. That's a very, very, very large userbase across which to amortise the costs of developing these things. Even a particularly popular production camera is likely to sell far less than a million units in total, making development time a much bigger deal.
It remains likely that computational techniques will become more popular in production cameras, particularly things like lightfield array. That's a rather more deterministic sort of computational imaging, where the errors are more likely to appear as noise around the edges of depth-separated objects as opposed to giving a man three arms for a couple of frames in a row. The dividing line between “clever compression mathematics” and “computational imaging” is (if you'll excuse the double entendre) blurred.
Regardless how it's done, though, issues over economics and development time are problems for the manufacturers; the biggest issue is whether these techniques can be applied without creating an unacceptable risk of errors and artefacts. We already tolerate errors and artefacts to achieve reasonable record times on smallish flash cards, we tolerate short depth of field to achieve better light transmission through lenses, and we've even come to love the artefacts created by 24fps imaging itself. What we'll tolerate in return for what benefit has already been shown to be pretty variable, but it's no great leap to assume that significant improvements in noise and sensitivity might be worth tradeoffs.
After all, with sensor manufacturers now advertising read noise at the two-electron level, we're starting to approach some fundamental limits. Create a sensor that can reliably read every single photon – which isn't an order of magnitude away from where we are right now – and it is fundamentally impossible to improve sensitivity any further by improving the sensor. Improve noise reduction, improve multiple-frame integration, and other techniques, and suddenly that starlight vision camera becomes... well, if not possible to make, at least possible to simulate convincingly enough that everybody will want to own one.
Image: Shutterstock - El Nariz