If you want true data integrity and a real backup system, RAID is not where you should be looking.
RAID is not backup.
This is the reality currently sinking into the mind of someone in the USA who's apparently in the process of taking a large software company to court. The case involves a six-figure sum on the basis that software developed by that company deleted some files. Those files represented movie footage, so they were presumably reasonably large, but it hardly matters: a bitcoin private key is 256 bits in length, a princely 32 bytes of data, and could presumably represent almost any arbitrary amount of money.
Fast, reliable, useful, but not backup
The details of the failure don't make much difference in the final analysis, but it's always worth being aware of what happens when things go wrong, so we can avoid it happening again. The files in question were lost from a cache or at least a conventional folder full of data which had been nominated as a cache. The very name “cache” suggests a somewhat temporary nature, though the files apparently shouldn't have been removed under normal circumstances. All non-trivial software has bugs, though, and it's not the first time that a major software company has had problems.
This is backup
Operating systems, which naturally have responsibility for absolutely all file management on a computer, have glitched in vaguely similar and just as awkwardly public ways. Usually, it happens when some sort of regularly scheduled process is taking place such as automatic cleanup of (apparently) unused files or a regularly scheduled removal of (presumably) disused cache material. There's a side issue to consider here as to how great an idea it can be to allow automated processes write access to hard disks. Even so, who among us has never enthusiastically group-selected an inconveniently large collection of files ending in ‘.tmp’ and hit the delete key with our fingers crossed?
This is possibly backup — cloud storage tends to be fairly well protected, but it's still only one copy because it's subject to a lot of single points of failure
And that can happen to anyone. So the fact that we're talking about video files and production software is largely irrelevant: the court will look at the facts, of course, but, on the face of it, the claimed value is entirely plausible and similar things happen all the time. What is relevant, and something that even quite inexperienced hands will already be screaming at the screen is that this data wasn't...
Do we even need to say it?
The point has been made a million different ways. RAID, again, is not backup. A disk array protects your data against hardware failures. It doesn't protect against accidents or flaky software: delete a file from a RAID and the RAID will carefully and faithfully protect the results of that action just as it protects everything else. Also, RAID does not remove all single points of failure from any computer setup. It's frighteningly easy to give examples. Power supply failures, for instance, can be particularly pernicious. Most of them work by turning the incoming alternating-current mains into (approximately) DC, then switching that high-voltage DC through inductors at high frequency.
This might well be backup because those two red rectangles are two different drive bays on a Blackmagic Hyperdeck, which both have SSDs in them. Two is better than one
In 240V countries that intermediate DC supply is often at above 300V, while the internal components of a computer may be designed for 5, 3.3 or even lower voltages. Needless to say, if the high voltage DC makes its way through the power supply due to a fault, it will destroy every component of the system, including all the disks in a RAID, the RAID controller, the motherboard it's plugged into, the RAM, the CPU, and the “on” light, and yes, that'll happen long, long before the mains inlet fuse clears the fault. Well-designed power supplies are built to make that extremely unlikely, but cheap ones may not be, and it's just one example.
It remains to be seen whether any court is likely to take the legal claim seriously, as such things are often decided on obscure points of precedent and semantics which generally have little or nothing do with what most people would consider natural justice. Opinions on that are likely to vary between a hardline interpretation, that the software bug is a bug and it should not have occurred, or that the user has been astoundingly cavalier with important information, both of which are true. Overall, the technical community is likely to take a pretty dim view of any approach that makes key data so vulnerable.
The learning from this is that we should always assume we have one less copy of crucial data than we actually have. If we have one copy, we might as well have no copies. And even if we have two, if there's a failure, then we have one, and we should be feeling very, very uneasy. In the end, an LTO drive is irritatingly expensive, but it's not hard to find a failure that's even worse.
Tags: Technology
Comments