Today we're talking about file formats again and the differentiation between containers and their contents. If you have ever felt that AVI (or MPEG, or MXF) files don't offer the same picture quality as Quicktime (or anything else), this should clear the matter up.
The idea of including, in a computer file, information about the file's other contents is naturally old as digital storage. A single still image, in its simplest form, is just a list of numbers representing pixel brightnesses, but even in this simple case there's more to know – what's the resolution of the file – how many pixels high and wide is it? Is it in colour, and if so, what order are the red, green and blue components stored in?
Add things like variable precision into the mix, where an image might be stored with a variable number of brightness levels per channel, and things get really complicated. Computers handle files in groups of eight, sixteen, or thirty-two bits (or more); if we want a colour image with ten bits per pixel, for 1024 brightness levels, each pixel is represented by thirty bits of data. It quickly becomes obvious that describing the contents of a file is a job all by itself.
Fast-foward to the present day, and the situation can be very, very complicated. A video file contains images, of course, and in the simplest case they can be stored in much the same way as the stills of old, as a simple list of pixel values. Even then, though, there's likely to be at least one soundtrack, and probably other data such as timecode and things such as reel name, take numbers, and other camera-oriented metadata.
The simplest way to organise this would be to include all of those things within the file in order, perhaps by specifying the length of each of them – ten characters of reel name, perhaps, followed by fifty characters of production name, followed by the images, which might themselves include information about height and width. Then, for every few frames, we might include a chunk of audio data. Oh, and timecode. Perhaps information about camera settings – was the image recorded in Rec. 709, or in a log mode? And what codec?
Complexity and change
Not only does this get very complicated very quickly, but it's likely to need to change over time, as new techniques are developed and new types of data need to be stored. What's more, certain types of software might not need to look at some of the data – an audio program might not need to decode picture information at all, and many consumer-oriented video players don't care about timecode, for instance. So, there's a need for flexibility that isn't well served by simply trying to think up all of the things that could be required in a file, and stacking them up.
The first widely-distributed attempts to solve this problem were things like AVI and Quicktime movies, which operate very similarly. The structure of the file is broken up into chunks (they are, quite literally, named chunks in the AVI file format), each of which is identified by a code which in the case of AVI and Quicktime is four characters long. In an AVI file, for instance, there's a chunk called “avih” which contains information about the video frames in the file – their resolution, the playback rate, and the codec with which they're compressed, among other things.
The actual data representing the frames is stored in another chunk called movi. Sharp-eyed readers will have already noticed a problem here, in that it's impossible for an AVI movie to contain material at variable frame rate, and that's true (quicktime instead specifies a time for which to display each frame, so it can handle variable frame rates).
Any player for the AVI video file format will need, by definition, to understand avih and movi chunks, but what if such a player were to encounter a chunk (strictly a list, but it's still a chunk of data) called “Tdat”? Well, “Tdat” is the chunk ID used by Premiere Pro to include timecode information in AVI files, but if a program doesn't understand what to do with that information, it can simply ignore the unknown chunk.
Implications
This implies two things. The first thing is that the contents of an AVI or Quicktime file are completely independent of the file format in question. It's possible to put incredibly high quality material - uncompressed 16-bit 4K, for instance – in an AVI or Quicktime file, if you've got a powerful system that can handle playing it back. With the help of crafty tools, it's possible to do deeply incongruous things, like putting Apple's favourite codec, ProRes, into an AVI file. And, because the AVI file so carefully describes what it contains, Quicktime will actually play that file back quite correctly (or at least it used to – it's a sufficiently obscure thing to try that it isn't often tried!)
The other thing about it is that these formats are very extensible, and they're extensible without breaking compatibility with old software. After all, the media player VLC doesn't know about timecode, so it simply ignores a “tdat” chunk if one exists in an AVI file. Metadata of any kind could be added without any issues of compatibility or of upgrading old software to cope, and it's a shame that this hasn't sometimes been done. The Broadcast Wave file format extends Wave in just this backward-compatible manner. MXF files, on the other hand, arguably represent something of a reinvention of the wheel and have demanded a lot of new software engineering which could possibly have been avoided.
It isn't my purpose here to propose AVI files for new applications, because they have problems that I've skimmed over, and I've used them as an example because they are relatively simple (Quicktime, in many ways, is extremely similar in structure, and since MP4 is based on Quicktime, it is, too). In any case, let's not fall foul of the idea that any of these reasonably modern formats are limited in terms of the quality of media or amount of metadata they can contain. They're containers, and they can contain an almost literally limitless amount of stuff.
So, which is best? It doesn't matter. The data they contain is completely separate from its container. The only consideration is compatibility.