For most of the history of film, if you wanted to insert something into the picture that didn't exist, the camera had to be stationary. Motion tracking allows artificial objects to be inserted convincingly into real footage. Phil Rhodes explains
Convincing alignment between the apparent motion in the frame with the motion of an inserted, unreal visual effects element was simply beyond the sort of precision a human being can achieve, even given an animation stand and a stack of paper with punched registration holes, and even if it's OK for the job to take a week.
Manual registration between real and unreal elements has been done – Who Framed Roger Rabbit is a shining example of the technique – but usually involving visibly unreal objects. People wanting to drop in more than a cartoon rabbit* need motion tracking, which is really just a term for getting the computer to evaluate how things are moving. This is something a machine can do with a degree of consistency that humans, practically speaking, can't. It's worth a brief examination of how this is actually done, because knowing how it works helps us shoot material that allows it to work better. Or, to put it another way, it's worth knowing how to keep the visual effects department sweet without having to organise a weekly beer delivery to their offices. Most people are aware of how poorly-shot material can cause problems with chromakey shots, but motion tracking is just as sensitive to noise and heavy compression.
The first users of point tracking techniques were the military, with their keen professional interest in designating a thing to blow up, and allowing an automated system to make sure the bombs, bullets, rockets and missiles all went down the appropriate chimney. The earliest implementations performed a simple search by exhaustion, taking the chunk of image containing the target and comparing its pixel values with those of potentially-matching areas, subtracting one set from the other and looking for a result near zero.
This is a longwinded and simplistic approach, and more modern mathematics allow for various refinements, but the fundamental idea is that of looking for matching images frame by frame. That's why most point trackers allow the user to define both a target area and a search box, which limits the amount of image that must be searched to find a match. This makes things a lot faster, but of course the search box needs to be large enough to enclose the largest possible motion between frames, or the target area will move outside it. At this point, the track will fail spectacularly, because this technique doesn't, in its simplest form, detect a good match; it simply detects the least worst match, which may be a very bad match indeed.
The earliest trackers worked like this, and some do to this day. The problem is that two digital images of an object in a completely stationary scene aren't precisely identical, let alone two images of an object in a scene with a lot of random motion in it. The simple technique described above does allow for a little slippage and noise by simply detecting which potential match is best, as opposed to requiring a perfect match. This is essential in any event because a three-dimensional object may (and almost certainly will) change in size and orientation from frame to frame, and the tracker must be able to keep up with that. Nevertheless, on slow movement or in a scene without much contrast or fine detail, noise and compression artefacts can start to affect the quality of a track.
We've all seen effects composites shot on film cameras which perhaps could have done with a clean and re-lube, and the resulting bob-and-weave between two objects on screen, and the results of a lousy track are somewhat similar – or, more to the point, the results of a lousy track are that some underpaid intern has to go through frame by frame, several times, and line it up by hand. The excellence of modern VFX tools, and the way they facilitate this sort of manual cleanup of problem footage, can obscure the problems caused by noisy film stocks, codecs and sensors, and the mistakes all cameramen occasionally make, and it's not just about greenscreen, either.
* I want to be clear that this is a vile calumny upon the good name of Roger Rabbit, which is a technical and artistic tour de force of the highest calibre, and a rare example of a feature length film shot entirely in a format bigger than 35mm. And it has Bob Hoskins in it.