Why can't you freeze-frame sound?

Written by David Shapton | Jan 30, 2016 9:00:00 AM

There's something rather curious about sound, which is that it can range from a simple, single tone like a sine wave (or a flute, if you like) to a complete orchestra at the climax of a symphony. Even day-to-day background "atmospheres" can be a complex mosaic of hundreds of sonic elements: just think of the din at a supermarket checkout, or the racket at a soccer match when someone scores a goal

You can see how complex sounds are created: they're made by all the individual sound-making things that are present at the scene. In fact, just about everything is capable of making a sound, if given a bit of kinetic energy through banging, hitting or shaking. All of these sound-generators get added together and the result is the mix of audio that you perceive, depending on your position.

So, nothing surprising there, then.

But what is surprising is that sounds of virtually any complexity can be reproduced by a loudspeaker, which is just a single sound-making element - normally a black paper cone.

How does a paper cone recreate an orchestra?

How can a single loudspeaker cone recreate the sound of an orchestra, which can have a hundred sound sources, not to mention the ambience of the room and the noises from the audience? A loudspeaker cone can only be in one place at a time, not hundreds. Intuitively, it doesn't sound like the sort of thing that can make a noise like an orchestra. The same applies, by the way, for microphones, which, again, only have one diaphragm or sound-sensing element.

If this strikes you as curious, then the way to understand it is to think about what's actually happening to the air at any point in any place where there is sound. Air consists of atoms and molecules. When large numbers of them move together you get a pressure wave. It is when this wavefront meets an eardrum, which is itself a single-membrane transducer, that the process of human audio perception begins.

Sound as a wave

It is the idea of sound as a wave that is important here. If you're used to thinking about video, you'll know what happens when you "freeze" playback. What you see is a picture, but it's motionless. Here's the key to the difference between video and audio: if you freeze audio playback, what do you get? Nothing. Silence, in fact.

Why is this? Why can't you "freeze" sound? I've been asked this question more than once by video editors, frustrated at the inability of their editing system to play the sound that's happening at the point of an edit. Some systems playback a loop of audio one frame long, but I've never found that to be very useful because nature doesn't conveniently divide audio events into chunks one 24th of a second in duration.

You can't hear sound when you stop playback because sound is always moving. It exists in time, not space. Images are spatial. When we talk about "resolution", we mean spatial resolution: how may pixels per inch or centimetre or whatever. If you stop a loudspeaker cone moving, no sound comes out. To hear a sound, time has to be passing.

Temporal resolution

But how can this single moving cone create a complex sound? It's because although it can only be in once place at one time, it can move in a complex way. Over time, a loudspeaker can move in almost exactly the same way as the sound wave it is trying to reproduce. Sound doesn't have a spatial resolution, but it does have a temporal one. It exists in time and not space. Pixels are meaningless for sound. In audio, the equivalent of pixel resolution is the sample rate: the more rapid the samples, the more accurate the audio.

But that's not all when it comes to comparing audio and video.

Because video has temporal resolution as well, and it's called the frame rate. And, yes, the higher the frame rate, the more information you actually see. Whether or not that's a good thing depends on who you ask - and it's a big subject that we've already covered extensively in RedShark.

View full post