In the midst of the word he was trying to say,
   In the midst of his laughter and glee,
He had softly and suddenly vanished away—
   For the Snark was a Boojum, you see.

         Lewis Carroll, “The Hunting of the Snark”

Boojums are mysterious whatsits that make things disappear, not with a big crash but without your ever noticing. You may think that important files are on your disk, nice and safe; but a boojum may have taken them away, leaving digital garbage in their place. Once this happens, every backup that you make of those files is just as useless.

There are ways to guard against silent file damage, but they’re rather inconvenient and time-consuming. You can store a checksum of a file and verify the file against it periodically. You can run an integrity-checking application on each file. But realistically, are you going to do this on files that you otherwise aren’t bothering to touch?

There’s another kind of boojum that wouldn’t help against: files that just somehow disappear completely. You never know why. Maybe you absent-mindedly deleted them. Maybe they got overlooked (note artful use of the passive voice) when you copied them to a new computer. Maybe they went to the same place missing socks go. Whatever, they’re completely gone.

The best low-effort defense against boojums is to keep older backups around. There’s a limit to how many storage volumes you really want to keep, but it could be a good idea to keep that old backup drive rather than throwing it away. This can lead to other problems, of course; just where are you going to find the hardware to connect to that SCSI drive and get the files off it? When you do, just how are you going to read files in some long-forgotten format? These are issues for future posts, but let’s get back to data corruption for now.

Some file formats are safer from this kind of damage than others. Whole files don’t usually get overwritten; generally it’s just a few bits that go wrong. With some formats, a flipped bit in the wrong place makes the whole file unreadable. This is true of most compressed file formats, such as ZIP and GZIP. It’s also often true of image formats. But if a text or HTML file suffers this kind of damage, you’ll probably just lose a character or two. Other formats, like XML, might go bad but be easy to repair by visual inspection.

The moments of biggest risk are when you do a major software upgrade and when you move to a new computer. Rewriting the amount of the disk that a big installation requires, or copying everything to a new drive, can make latent problems become disasters. You do back up before doing things like those, right? But beyond that, take some time to make sure everything you need is still there before throwing out the old machine or backup.

There are a number of ways to reduce the likelihood of file damage.

  1. Do regular backups, preferably with multiple copies.
  2. Don’t work with a file on a network drive. Edit it on a local drive, then copy it to the network volume. Likewise for flash drives (which you shouldn’t consider reliable primary storage anyway).
  3. Avoid working on a disk which is nearly full.
  4. Shut down your computer properly. Don’t force-quit applications unless you absolutely have to; and if you do, check as soon as possible whether you need to use the backup file.
  5. Run a disk verification utility periodically, letting it repair the disk if necessary.
  6. Don’t let malware onto your computer.

There’s a tradeoff between redundancy and safety against damage. If you save compressed archives of whole directories, you might lose everything in the directory. If you save them uncompressed, your chances are better of just losing individual files (though directory structures and whole volumes can be damaged too). With today’s huge storage volumes, compression isn’t as important as it once was.

Sometimes I do follow my own advice. After writing all of the above, I realized I hadn’t verified my own main drive for quite a long time. I ran the Mac OS Disk Utility and verified the disk, and there were indeed errors! By rebooting from the installation DVD and running Disk Utility from there, I was able to repair the problems. (But what do you do if your computer company has started shipping its operating systems on the cheap and not providing installation discs any more — like OS X Lion?)

Suggestions: Follow the steps above as much as you can. If you’re an admin, resist pressure to throw away “redundant” backups. If you get reports of files that have turned into garbage, think about the possibility of file structure damage.

Links: