RedShark Replay: We all know that 8K is most certainly already here. But here's another chance to read this controversial article, which is also one of the most read-ever pieces on RedShark. What do you think?
Living memory is a wonderful thing. For anyone born in the last sixty years, it encompasses more change than any other successive generations have ever seen. And one of the things that we’ve got used to is increasing resolutions.
From our perspective, today, it all seems to have happened very quickly. Anyone working in professional video today will have very clear memories of Standard Definition video. Some of us still use it! But the current working paradigm is HD.
Next on the horizon is 4K. And, with almost unseemly haste, we’re already talking about 8K. In fact, some organisations, like Sony and the BBC, kind-of lump together any video formats with greater than HD resolution, using expressions like “Beyond Definition” (although in Sony’s case, that also means that resolution isn’t everything and that there are other factors like increased colour gamut and contrast that matter as well).
Everyone wants better pictures. There’s nothing wrong with the principle that - all things being equal - if you can record your images in a high resolution format, then you probably should.
The idea of digital video is now so well established that it’s virtually passed into folklore. At the very least, the word “Pixel” is bandied around as if it’s always been part of the language.
In reality, it’s not been around for very long. Cathode ray tubes don’t use pixels, and nor do VHS recorders or any type of analogue video equipment.
Before pixels came along, video was recorded as a continuously varying voltage. It wasn’t quantized, except, arguably, by the end of a scanning line and the completion of a video field.
Digital video is exactly that. It’s video represented by digits. It’s rather like “painting by numbers” except that rather than representing an image by drawing lines that separate distinct colours, a regular grid is imposed on the picture. Each element in the grid is a pixel, and it is allocated a number that can be used to look up the colour under that part of the grid. It really is that simple.
But of course, it’s not the best way to represent an image. Nature isn’t made up of a natural grid, and even if it was, it wouldn’t match the superimposed pixel grid.
When you think about it, it really does take a stretch of the imagination to understand how something as subtle and organic as a flower can be represented by a string of binary digits. The two concepts might as well exist in different universes. And actually they do: the analogue domain and the digital domain.
But the miracle of digital video is that if you have enough pixels, you won’t notice them. Your mind sees the digital image as if it were an analogue one, as long as you don’t get too close.
That’s the thing. If you don’t have enough pixels and you’re sitting too close, you’ll be able to see the grid.
Most people reading this know this stuff already, and I’m reiterating this part of the theory simply to show that pixels aren’t ultimately the best way to represent images. Yes, if you go to HD for “normal” sized TVs in the living room, it looks good; great, even. And if you want a TV that’s twice that size (and four times the area) then it absolutely makes sense to move to 4K.
There are genuine reasons why you might want to have 8K. For example, even if you can’t see the individual pixels in HD or 4K, if you look closely at diagonal lines, you can see jagged edges, and the closer the line is to horizontal or vertical, the worse it gets. You could even say that aliasing magnifies the pixelation by making it more noticeable. Quadrupling the number of pixels reduces this.
There are a number of developments that make me think that not only are there other improvements that will be of more benefit than the pain and expense of moving up to 8K, but that perhaps we might move away from having pixels at all.
Now, please note that we will probably always have to have pixels when it comes to displaying pictures. Unless we invent a digital method to display video that has some kind of organic, non grid-based method, then we will always see the world through a spatially quantized grid when we see recorded or transmitted images. But what will, I think, change radically, is how we store the video.
What I think will happen is that we will move towards vector video.
If you’re a graphic artist, of if you’ve ever played or worked with Corel Draw or Adobe Illustrator over the last thirty years or so, then you’ll be familiar with the distinction between vector and bitmap images. A bitmap is the familiar grid of pixels, each with a set of numbers that describes the colour of the individual square. A vector is completely different. Instead of explicitly stating the colours of each and every part of the object, a vector is a description. You could almost think of it as extremely detailed metadata.
Here’s a very simple example. Think of a capital letter “I”. In a sans-serif font like this, it’s about as easy an object to describe as you can get. In natural language it would be something like “a black, vertically oriented rectangle, about 5mm high and 0.5mm wide”. That’s it. if you follow that description, you get a perfect “I”. You don’t have to worry about pixels: there’s enough information in that description. The shape and form of the object is drawn from the meaning of the description; nothing more.
The world is, of course, full of somewhat more complicated objects than this. Even the lower case version of “I” calls for a much longer description. You’d have to say that the black vertical rectangle is somewhat shorter, and has a black circle above it, which is the dot.
You might wonder how on earth you go about describing a letter “g” or “k”. But all you need to know at this stage is that it is indeed possible and the proof of this is that this is exactly how fonts and typefaces themselves are described in font files.
The very big advantage of this method is that if you have a good description of an object, you can make it as big or as small as you like and you won’t lose any detail. It’s only when you show it on a monitor or screen that you have to bring pixels back into it and the software in your system takes care of that. Here’s the most important thing: your system will always recreate vector-described objects at the optimum resolution for your display. A circle is a circle whatever size or resolution you reproduce it at. Vectors are not stored at SD, HD, 4K or 8K resolution, any more that a wild animal is composed of a grid of pixels. An idea is not a bitmap image.
This might work with distinct objects, but what about more complex objects and scenes?
There is a process called “autotracing” which can extract the essence of a picture and turn it into what is, essentially, a description. Imagine putting tracing paper over an image, drawing over all the visible lines, and then shading or colouring in the spaces between the lines with all the appropriate gradients. Every straight line can be described using its length and direction. Every curve or combination can be described precisely using “Bezier” techniques. There is, in theory no scene that can’t be described like this.
It all sounds pretty simple but the reality is likely to be far from that. The issue is that some scenes are incredibly complicated - especially when you are talking about very high resolution images.
And, of course, we’re not just talking about still images here. So how would this technique apply to video?
In exactly the same way, except that there are some even bigger advantages.
As we mentioned above, once the video is encoded as a vector description, it can be decoded at any resolution. Every line, shape and gradient will look pin sharp. Even at 8, 16 or even 32k. The only thing determining the quality of the picture - no matter what the resolution, is the quality and accuracy of the vector description.
There’s another huge advantage. Frame rates are no longer an issue. That’s because in addition to mapping vectors within a frame, a vector description would track these objects as they move in time - in much the same way as MPEG and H.264 do today - except that these movement vectors would be much more accurately notated.
You could output this video format at absolutely any frame rate without losing any resolution. You could even speed up the output frame rate when there’s a lot of action, and slow it down when things calm down again.
Is this really possible? Is it practical?
Largely, yes, and yes.
We’ve already seen this video of some work by the University of Bath to create exactly what we’re talking about here - a vector-based video codec.
And there is already a sense in which we have vector video, in the form of CGI animations. 3D animations are based on models that are nothing more than 3D vector descriptions, with the addition of textures. It would be very easy to build a driver that would output these as vector video.
It’s very hard to say if and when this will happen. My feeling is that there is a long way to go before we have a vector-based video codec that can rival the quality of 4K and 8K. You'd probably need a hundred-fold increase in processing power to abolish pixels That sounds a lot, but it's only about ten years with the current rate of progress.
8K poses such challenges - especially in its storage and transmission, that I can’t help thinking that we would be better off looking at more effective ways to encode video. And if we can do it without resorting to pixels, then I'll be very pleased.
There's much more to say on this subject, and several developments that suggest it is going to happen. Watch out for more soon in RedShark.
Read: Video without Pixels - the Debate