RedShark Winter Replay: How fast is your computer inside?

Written by Phil Rhodes | Dec 31, 2013 12:00:00 AM

This holiday we're re-running some of our most popular articles, in case you didn't see them the first time. Today: Digital video has always meant moving a lot of data around inside a computer. And as computers get ever more powerful, Phil Rhodes explains what really goes on inside your video workstation

Digital video has always meant moving a lot of data around inside a computer. And as computers get ever more powerful, Phil Rhodes explains what really goes on inside your video workstation.

It's a familiar situation, at least to anyone involved in film or TV post-production work, to wish that computers were faster. We're all used to the headline specifications of computers – the type of processor, the speed at which it runs, and the amount of memory, but there are details behind the scenes which mean that before about 2003, for example, it was impossible to make an interface board for most personal computers that could handle HD-SDI in any useful way. Before then, although the video handling software had existed for more than a decade, the disk arrays were big enough, the processors fast enough (though of course less capable than today's), and the memory large enough, the interconnections were designed to transfer data inside the computer, which until quite recently meant that it simply wasn't possible to move, say, HD video between an expansion card and the rest of the machine in real-time.

The distinction between the maximum amount of data that can be sent from one place to another, versus the amount that a device is capable of generating, is often overlooked. For instance, the protocol that's generally used to connect hard disks to computers is usually capable of either 300 or 600 megabytes per second, as is often quoted on promotional materials for hard disks. The thing is, that's the speed at which the electrical connection between the disk and the computer is created and tells us nothing about the performance of the component in practice, which will be limited by its mechanical parts.

So, this is the first in a series of two articles about the way we connect computer parts together, and how we connect external parts to computers. It's designed to go some way beyond the common, fairly basic understanding about clock speeds and RAM capacity, to allow a more intelligent specification of new hardware purchases, as well as to dispel some of the uncertainty that exists around the subject.

Buses and Peripherals

In computing, we refer to any path through which data can flow as a Bus. The original Buses were a set of printed circuit board traces (tracks) connected to a processor through which instructions and/or data could flow, with eight traces for an eight-bit processor, 16 for a 16-bit processor and so on. With each transition of an associated clocking signal from a logic 0 (no volts) to a 1 (often five volts), the processor would either receive an instruction to perform, or load some data upon which to perform those instructions. The nearest equivalent in a modern workstation is the front-side Bus, through which the processor is connected to devices such as the memory controller, and other hardware responsible for connecting to other expansion devices – Peripherals, in the jargon.

The traditional front-side Bus is moving into history with modern processors which typically include on-board memory controllers, but nonetheless there is generally a high performance Bus which is used to communicate with other parts of the computer. Intel call their specification for this Bus QuickPath Interconnect (QPI), while AMD use a much-updated version of a standard that was also used in PowerPC Macs back in the day, called HyperTransport. Both are somewhat more sophisticated variations on a collection of PCB traces down which data flows, often at very high speed. The initial implementation of QPI was capable of transferring nearly 26 gigabytes per second between an Intel Core i7 900-series processor and its X58 input-output controller, which allows the processor to communicate with the rest of the system.

These very fundamental levels of interconnection are often of little interest to the user. None of them are made available in any form that allows external devices to be connected, and the parts to which they are connected are basic, essential components such as the processor or memory. Effectively all modern workstations (and even quite modest home PCs) have a front-side Bus, or a more modern equivalent, which is more than capable of transferring most HD video in real-time. One level up from this, however, and we begin to encounter true input-output (I/O) Buses such as the various generations of Peripheral Component Interconnect (PCI), which have for decades allowed us to customise the configuration of a workstation by plugging in expansion cards. And this is where things become interesting, because the original specification for PCI, which is what the little white slots in the back of most PCs until the last few years represent, was not all that capable. By the standards of the early 90s, it probably seemed OK, with a 32-bit Bus running at 33MHz, and thus capable of transferring about 130 megabytes per second. At the time, most processors weren't capable of doing calculations on more than 16 bits of data at once, and frequently didn't run at much more than 33MHz themselves, so it would have seemed a reasonable specification. The sharp-eyed will already have noticed, however, that 130MB/sec is not enough to transport 10-bit HD video at cinema rates. The situation is aggravated by the fact that all of the devices plugged into a given computer must share that capability, and where the computer makes a request that a peripheral needs time to fulfil, no other device can use the Bus while the peripheral is at work.

PCI Versus PCI Upgrades

Long before it was made obsolete by the successor, standard PCI Express, it had become very clear that PCI was not adequate. Supplements to the standard, such as PCI-X, which doubled the width of the Bus to 64 bits and quadrupled speed to 133MHz, made it possible to build computers capable of handling HD video, 2K film scans and other high performance tasks, especially when certain designs implemented two completely separate PCI Buses within a single machine. These upgrades also included measures to alleviate other logical problems with the basic PCI design, including a better mechanism for Peripherals to make the host system aware that something had happened (“interrupts”), and more elaborate time-sharing - mitigating the shared nature of the Bus by allowing other Peripherals to use the Bus while one particular Peripheral was working on its response to a request.

Even so, with electrical signals moving around at frequencies reaching into the hundreds of millions of cycles per second, fundamental physical limitations began to assert themselves. At very high speeds, the various lengths of individual PCB traces in a 32 or 64-line-wide Bus can cause signals to arrive with a delay that's sufficient enough to cause problems, and that’s without considering other electrical factors like capacitance and impedance.

The fundamental difference between PCI and its successor, PCI Express, is that the latter standard considers each Bus line to be an individual signal, intelligently recovering timing data from the information sent. PCIe is therefore a serial Bus, with bits sent in sequential order, as opposed to the parallel Bus of PCI. This avoids many timing problems, allowing much higher overall throughput, even though only one bit can be sent on each PCIe “lane” at once. Many – if not most – PCIe devices actually use several of these serial lanes, synchronising them using numbering embedded in each serially-transmitted packet of data to properly order and interpret the received information. Even better, PCIe is a point-to-point protocol, with no sharing of resources, and provides for full-duplex operation with devices able to send and receive simultaneously.

The basic unit of a PCIe Bus connection, a lane, is capable of transferring up to 250MB every second (subject to some engineering limitations). A typical HD-SDI I/O Peripheral, such as those made by Blackmagic Design or AJA, is a four-lane device with sufficient performance to accommodate even high frame rate HD video, and that's the most basic version of the standard: the second revision, which has been current in shipping computers since late 2007, is twice as fast. The PCIe people promise us versions 3 and 4, each broadly representing a doubling of capability, and the biggest PCIe slots have 16 lanes. That's a lot of capability – over 31 gigabytes a second, or enough to transport widescreen 10-bit RGB 4K at just under a thousand frames per second, if anyone ever had a need to do that.

View full post