Replay: Phil Rhodes explains why luminance is so important within photography
The way in which electronic cameras encode luminance values has been an issue for engineers ever since the birth of electronic cameras. If you wanted a television system – that is, to send moving pictures by radio - the most obvious solution would be to take an electronic device that is sensitive to light and send its output (or the output of many) as a variation in the strength of a radio signal. Or, to store it as the strength of a magnetic field, on tape, if that was your goal.
The problem with such simplistic approaches is that things are very rarely linear – the output of your light-sensitive piece of electronics might not precisely double if you double the amount of light falling on it, and the light output of your reproduction device might not always conveniently double if you double the signal level. Although some light sensitive devices actually do behave almost as straightforwardly as that (CCDs, for one), what's certainly true is that the human eye isn't anywhere near linear: if we double the amount of light coming out of something, it doesn't look anything like twice as bright. If we take a series of grey blocks that successively double in luminance, they'll actually appear to humans as if the change is a nice continuous ramp, with the same difference in brightness between any adjacent pair of blocks. It is for this reason that photographic stops each represent a doubling of luminance, but look like a series of consistent increases. We perceive the difference between, say, F/4 and F/2.8 as the same increase in brightness as the difference between F/8 and F/5.6. In terms of absolute light intensity, though, the difference between 4 and 2.8 represents a considerably larger increase than the difference between 8 and 5.6.
This causes a problem with precision, where “precision” means that we aren't using enough of our signal – which often means not enough individual digital numbers – to represent certain differences in brightness. Assuming that we are storing an image in an 8-bit file, all of the theoretically infinite variances in luminance in a scene must be stored as one of 28=256 levels. If the image data linearly represents the absolute amount of light in the scene, a situation we call Linear Light, an excessive amount of these levels would be used to encode what, to us, looks like a very small increase in brightness at the top of the scale. In photographic terms, considering that each F-stop represents a doubling of the absolute amount of light, half of the available values (the top 128) would be used to encode the brightest single F-stop's worth of information. On an everyday camera capable of perhaps 12 stops of dynamic range, the other 11 stops, containing the vast majority of the image, would be encoded using the other, lower 128 values. Were we to store a linear-light image this way, the darkest parts of the image would suffer from awful banding (properly, quantization noise) as the small number of digital values used to store shadow detail would be used to indicate widely-spaced differences in brightness. A similar problem exists – or existed – in the analogue world, where very small signals could be subject to excessive noise and would hence degrade the image in shadowy areas.
The most straightforward solution would be to boost the level of the darkest areas of the scene before transmitting them, while leaving the brightest areas alone. The technical implementation of this is referred to as gamma correction, because the actual mathematics used are a power law, where the input is raised to a given power to create the output, and the exponent is represented by the Greek letter gamma. The result is that a graph representing the light level against the signal level is quite pronouncedly curved, very approximately like applying the curves filter in photoshop, grabbing the middle of the curve, and dragging it upward. As such, without gamma encoding, uncorrected linear-light images look very dark and gloomy when displayed on conventional monitors that expect gamma-corrected signals.
Monitor technology complicates this situation. Traditional electronic displays – CRTs, mainly – did not have anything like a linear response, which simply means that doubling the signal input didn't make the tube output twice as many photons. Through a bit of clever engineering and a lot of blind luck, it turns out that the power-law gamma encoding used to make linear-light images workable in analogue television broadcasting is almost counteracted, with surprising accuracy, by the nonlinearity of a CRT monitor, with the remaining being trimmed out electronically and the resulting image having appropriate luminance.
I use the word “appropriate” here because anyone who's ever looked at a monitor on a camera, then glanced at the actual scene, is well aware that the brightness and colour of the two do not often, if ever, look particularly similar. Until the advent of serious digital cinematography, we relied upon the cooperative design of cameras and displays to provide results that were nevertheless watchable, and to ensure that no matter what we shot, it would look, if not precisely like the original scene, at least viewable and without egregious errors.
Given this situation, it's ironic that the zenith of sensor technology during the standard-definition TV era was the charge-coupled device, or CCD, which actually has an output that's quite close to linear. Double the exposure of the image falling on a CCD, and, until it clips at maximum output, its output will very nearly double in intensity; it is a linear-light device. Now, this is not to say that the actual output of a real world CCD-based camera has these characteristics, because the camera must internally gamma-encode the images coming from its sensor so that they're compatible with the rest of the world's equipment. Of course, standard displays are generally not CRTs any more – they're more likely to be a TFT liquid-crystal display, which has its own non-linearity, but with electronics to alter the signal supplied to it until it appears to have the same performance as those old-style cathode ray tubes. So, until relatively recently, and still in broadcast television, we have had TFT monitors pretending to be cathode ray tubes, so cameras which are actually quite linear, but designed to drive long obsolete types of display, produce pictures that look right. It is perhaps not surprising that the world of cinematography prefers to avoid these machinations.
Other than simple perversity of design, the reason that gamma encoded images are bad for digital cinematography is that in order to create a viewable image without the need for manual colour grading – which is impractical in applications such as multi camera studio production and electronic news gathering – certain assumptions must be made about how the image should look. This is particularly true with regard to the brightest areas of the image, which are compressed the most by gamma encoding as that gamma curve approaches the horizontal toward the top of the graph. Where a production does want to grade, options are limited by gamma encoding, especially since the highlight region is where electronic cameras traditionally compare less well to the photochemical film stocks they're replacing.
Solving this problem is simply a matter of choosing not to gamma-encode the data coming from the imaging sensor, although storing linear-light information has its own concerns. First, and most critically, modern CMOS sensors, which means most current and upcoming digital cinematography cameras, don't actually produce linear-light data. We must be careful here as real-world CMOS sensors may include a significant amount of image processing electronics within the device which might cause the data to appear linear. However, the actual light sensitive elements of most CMOS sensors, which are photodiodes operated in the reverse-biased mode, have a response to light which increases the signal proportionally to the square root of the amount of light that hits it (there are almost as many varieties of CMOS sensor as there are CMOS sensors, so these notes apply only to the most basic expression of the technology). Therefore, any CMOS camera claiming to be storing “raw” sensor information in anything approaching linear light must be processing its information to at least some degree. We must be cautious when considering what the performance of a sensor is, as opposed to the changes made to that performance by associated electronics, even if that processing is incorporated in the same physical piece of silicon as the sensor photodiodes.
As we saw above, performing a gamma encoding prior to storing the image, as opposed to doing so on recovery, can make image storage practical in an 8-bit image. But as we also learned, it can make grading more difficult. There are two solutions: either use more bits, so that there are enough luminance levels available to overcome the problems associated with the brute-force approach of storing a linear light image, or modify the image somehow such that it makes better use of the available digital luminance levels, without affecting grading.
If we store an image with at least 12 bits of precision, providing 212=4096 luminance levels, it may become practical to work with a linear-light image. This is why the internal design of television cameras is often 14 or 16 bit, to provide enough precision to apply gamma to the (often nearly linear, for CCDs) sensor data directly, avoiding quantization noise and other precision problems associated with rounding-off the results of digital mathematics. Eventually, in postproduction, or ini monitors on set, it will be necessary to apply some form of gamma-like modification to the data to view the image, but advances in data storage techniques such as flash mean that storing 12, 14 or 16-bit linear-light images is more practical than ever. The advantages of doing so are in simpler, potentially slightly less power-hungry cameras and the ability to do certain types of mathematics – such as colour balancing – without having to do any preprocessing.
The other solution to the problem of storing linear-light images is to apply some form of amplification to the darker values, representing shadow detail in the scene, such that they are represented by a reasonable number of digital luminance levels, but to do this without causing the problems of gamma correction. In this situation, “reasonable” might mean that a change in image brightness as perceived by the human visual system is represented by an equal change in the numeric luminance value, regardless of the absolute light level in the scene. To put it another, perhaps simpler way, we want the bottommost stop of the image to be represented by the same number of digital counts as the uppermost stop.
This ideal can be closely approximated using a logarithmic curve, matching the behaviour of F-stops wherein each doubling of light appears as a consistent increase to the eye. The use of logarithmic encoding for images was originally developed by Kodak for its Cineon film scanning system, and some camera systems provide options to record images suitable for use in post production procedures such as colour grading systems which expect Cineon data.
There are a number of potential stumbling blocks for users of all these techniques. First, the terminology is confusing: when the industry began using logarithmic encoding, universally referred to as “log”, it was natural to begin referring to non-log images as “linear.” But of course, most video images are gamma-encoded, often for display on conventional video hardware, and are nothing like linear with respect to the absolute amount of light in the original scene. It is for this reason that we use the special term “linear light” to refer to actually-linear images.
The final great confusion of all this is that real cameras very rarely produce truly linear, truly logarithmic, or truly linear-light data. The camera manufacturer does not generally have access to actual linear data to begin with, so anything described as “linear light” is more properly described as “data processed to approximate linear light.” This being the case, it is almost always necessary for camera manufacturers to perform processing on the image data, and in doing so they are often tempted to make changes which they feel make for better performance. This is very much the case in gamma-encoded conventional video cameras, which usually provide user-accessible features such as auto knee to allow for various different highlight characteristics. It is also the case in log-encoding devices, which is why postproduction people must concern themselves not only with what system is in use – log, linear, linear light – but also with what kind of log is in use. Logarithmic images are very rarely based on a simple logarithm of the linearised image data.
Given all this, one might yearn for a return to the days of television, video and CRTs, or perhaps to the world where film looked the way it looked based on how it was manufactured and processed. It's also reasonable to expect all this variability to settle down into standardisation, much as the various gauges of film did in the latter part of the 19th century and the first few decades of the 20th. Even so, this all represents a rather complex situation with the potential to cause people serious problems with images that just don't look right, and for the time being at least, a cautious and methodical approach remains necessary.
I'm indebted to David Gilblom of Alternative Vision Corporation for his notes on CCD and CMOS sensor performance which I used in the preparation of this article.