What’s deadlier than the Death Star? The rm *

2 Comments

A year’s worth of production on Toy Story 2 was nearly lost due to user error and a bad backup. The video “Toy Story 2: The Movie Vanishes” gives a fanciful telling of how this happened, but it’s mostly possible to separate fact from creative elaboration. A tweet from @DisneyPixar confirms the authenticity of its source. It’s also on the Toy Story 2 Blu-Ray DVD combo pack.

Leaving aside the clever but silly stuff about the characters vanishing one by one, we can extract this account: A great many critical files for the production of Toy Story 2 were on a single Linux or Unix computer. Someone who shall remain nameless (and probably jobless) mistakenly entered “rm *” on the directory with these files. An emergency call was made to the sysadmin to just yank the plug out of the wall; this was done about 20 seconds from the time of the entry of the command, most of the files were gone. Worse, the backups for the past month were defective to the point of uselessness. Fortunately, the technical director, Galyn Susman, had been working at home and had all the files there. They physically transported the computer to the studio to copy all the files back.

I’m hoping the video takes serious liberties with what happened for the sake of entertainment. If not, the amount of stupidity that happened is staggering. Let’s assume, for the sake of a lesson in what not to do, that they really did all these things.

First, the Linux/Unix command line is dangerous. rm * will delete all the files from your directory. Think three times before hitting Return after typing it.

Second, if you do enter a mistaken rm *, DON’T UNPLUG THE COMPUTER, YOU IDIOT!! That will just damage the file system and won’t be quick enough to save any files. Hit Control-C. It’s much faster and safer, though even that will probably be too late.

But it took 20 seconds to delete all the files. That says there were a lot of files. It also says they were all in a flat structure with no subdirectories, since rm * doesn’t remove subdirectories. OK, maybe the command was really rm -r *, but the makers of the video were trying to keep things simple and dramatic. If you type rm -r *, think four times. If it’s rm -rf *, make it at least six.

Then, instead of bringing a drive to Galyn’s house and copying the files onto it, they wrapped her computer — the one with the only copy in the world of a year’s worth of work — in blankets and drove it in a car to the studio. Cue Christine Lavin singing “What were you thinking??” But at least they had an offsite backup, even if it was by chance.

OK, I’m not being fair. It’s a video for entertainment, and they really didn’t do quite all of those stupid things, other than not verifying their backups. I hope. But there are still lessons to learn.

1. If you have critical, valuable files, keep a backup of them and occasionally verify that the backups are good. This doesn’t have to be an elaborate check. Just look at the backup drive and list the directory, making sure current backups are there and have plausible file sizes (> 0).

2. Keep an offsite backup. With an operation like Pixar, security considerations apply as well; you don’t want to give backups to just anyone. But by the same token, if you’re Pixar, you can afford to make backups, move them offsite, and give them decent security.

3. If you need to restore from your last offsite backup, treat it extra carefully. If you can’t restore remotely, copy it at the offsite location and restore from the copy.

4. NEVER unplug a running computer to interrupt processing! (Sorry, that one really gets to me.)

Thanks to Mary Ellen Wessels for helping to research the video.

The challenge of too much digital stuff

1 Comment

Yes, you want to save all those important family photographs. There’s just one problem: They’re mixed in with thousands of pictures. Which ones do you really want to keep? Should you pick out the important stuff and save it in a collection that gets special attention? That can be a lot of work. Should you toss everything onto a terabyte drive? That saves effort, but will your grandchildren bother to go through it for the pictures that are worth remembering? Are there pictures in that big pile that you’d much rather they didn’t see?

The Library of Congress offers some advice which sounds useful:

Identify where you have digital photos

  • Identify all your digital photos on cameras, computers and removable media such as memory cards.
  • Include your photos on the Web.

Decide which photos are most important

  • Pick the images you feel are especially important.
  • You can pick a few photos or many.
  • If there are multiple versions of an important photo, save the one with highest quality.

That’s a reasonable agenda for a full-time librarian. For those of us who have other things to do, it’s a daunting challenge. It’s not unusual to have a few thousand digital photographs lying around and very little organization about them. The only clues you may have about their content and context are the file date (which can be wrong) whatever you can gather by looking at the picture. Some of my oldest digital photographs, from 2002 or earlier, have no metadata beyond the digital characteristics of the picture; they don’t say when they were taken or even on what kind of camera. The files have names like DSCN0128.JPG. The file with that name, incidentally, is dated “Jan 1, 2001 12:00 AM,” and it’s not from a New Year’s party. I remember that party. We toasted the New Millennium to Also Sprach Zarathustra. The picture isn’t from there.

But I digress. The point is, you have lots of pictures and don’t necessarily know a lot about them. Storage is cheap, and time isn’t. The best strategy may be a culling strategy; not “pick the images you feel are especially important,” but “spot the ones that are plain junk and get rid of them.”

While you’re doing that, try to give the pictures some sort of organization. This is best done on an ongoing basis rather than in one desperate plunge. My approach is to have a bunch of folders with descriptive titles, by geographic area or event or whatever. A lot of the folders have subfolders; I have a “Cons” folder with subfolders by convention name, most of those with subfolders by year. I drag pictures from the camera import folder to an appropriate folder. The ones that don’t get dragged out of the import folder eventually get deleted. Sometimes I rename the files to something descriptive and add metadata; sometimes I don’t. The result is a semi-organized collection of pictures with some stuff which other people might find interesting in years to come. Since I’m the kind of person who writes about file preservation, I have occasional bursts of fanaticism that let me improve the organization a bit.

To one degree or another, that’s the approach which makes sense for most people. Try not to lose anything important. Keep it as organized as you can on an ongoing basis. Don’t make an unnecessary effort to live up to the standards of people who work for libraries (like me). A good management tool (e.g., Adobe Bridge but not iPhoto) is a big help. It’s much easier to maintain good preservation habits than to tackle a huge organization project.

Time Machine Maintenance

3 Comments

Tardis repair manualWhere you have an FTL blog, there must be time machines. Apple Time Machine, that is. It provides a convenient way of backing up files without thinking about it, but there are tricks to using it best and mistakes to avoid.

You really should start by understanding how Time Machine stores files. If you look at a Time Machine backup volume that’s been in use for a while, you’ll see a directory called Backups.backupdb. This is the root directory for all your backups. Under it will be a directory named for your computer. If you’re backing up more than one computer to the same drive, they’ll all be listed there. Under that will be some number of directories whose names look like timestamps, and a directory alias called “Latest.” You should see directories for every hour your machine has been on in the recent past, dwindling down to dailies when you go back a day or two, and weeklies when you look at directories more than a month old. There may be gaps, depending on how much of the time your computer is turned on.

If you look under any of these timestamps, you’ll see what appears to be a complete copy of the contents of your drive. Add all of these copies together, and you may find it contains many more gigabytes of data than your whole drive! How does it do this? The answer is incremental backup and the Unix “hard link.” When it does a backup, only files that have been added or changed since the last backup are copied; the rest are just directory links to older backup directories. Unlike an alias, a hard link is a first-class citizen of the file system; for all practical purposes, the file is there, and it’s also where it used to be. A hard link, like a Tardis, lets your files be in two or more places at once. The nice result of this is that if you want to get a particular file or folder back, you just have to find it on the backup volume and copy it.

Older versions of your files, as well as deleted files, are retained in older backups. Eventually your drive will fill up, and TM will delete the oldest backups, so don’t count on it to keep anything forever.

As far as I can tell, Apple doesn’t say what happens if you shut down your computer while a backup is in progress. This help item says, for OS X 10.7, that if a backup is accidentally interrupted, “Time Machine resumes the backup where it stopped.” A test shows that if I shut down during a backup, a file with a name ending in “.inProgress” is left, and a backup directory appears only after the backup is complete. If Time Machine is in the middle of a long backup and need to leave your computer, you’re probably better off letting it finish rather than making it start over.

Knowing how Time Machine works provides some clues about how to use it most effectively. It automatically excludes some files, such as caches, from backup, and you can tell it to exclude more. If your backup drive is less secure than your computer, you might want to exclude files that have serious confidential information. Time Machine handles large database files poorly, since all it knows how to do is make a new copy of a file that’s been changed. They could be another candidate for an alternative backup strategy. But make sure you have some reliable backup; those database and confidential files are probably important, and losing them completely isn’t good security!

With some work, you can have Time Machine back up a database monthly (or however often you like) even if you change it every day. Exclude the database file from backup, and set up an automated task such as a cron job to copy it once a month to a directory that isn’t excluded. Time Machine will back up the copy only when it changes.

Avoid the temptation to delete files from Time Machine. because of the way directories are linked together, you might either lose more than you think or not really delete what you think you’re deleting.

Multiple backups are better than one, and a drive that isn’t usually connected to your computer is safer than one that’s always connected.

These are just a few tips for using Time Machine effectively. To learn lots more, read the articles in the links below.

Suggestions: If you have a Mac, take advantage of Time Machine to automate your backups, but use it intelligently.

Useful links:

Softly and suddenly vanished away

2 Comments

In the midst of the word he was trying to say,
   In the midst of his laughter and glee,
He had softly and suddenly vanished away—
   For the Snark was a Boojum, you see.

         Lewis Carroll, “The Hunting of the Snark”

Boojums are mysterious whatsits that make things disappear, not with a big crash but without your ever noticing. You may think that important files are on your disk, nice and safe; but a boojum may have taken them away, leaving digital garbage in their place. Once this happens, every backup that you make of those files is just as useless.

There are ways to guard against silent file damage, but they’re rather inconvenient and time-consuming. You can store a checksum of a file and verify the file against it periodically. You can run an integrity-checking application on each file. But realistically, are you going to do this on files that you otherwise aren’t bothering to touch?

There’s another kind of boojum that wouldn’t help against: files that just somehow disappear completely. You never know why. Maybe you absent-mindedly deleted them. Maybe they got overlooked (note artful use of the passive voice) when you copied them to a new computer. Maybe they went to the same place missing socks go. Whatever, they’re completely gone.

The best low-effort defense against boojums is to keep older backups around. There’s a limit to how many storage volumes you really want to keep, but it could be a good idea to keep that old backup drive rather than throwing it away. This can lead to other problems, of course; just where are you going to find the hardware to connect to that SCSI drive and get the files off it? When you do, just how are you going to read files in some long-forgotten format? These are issues for future posts, but let’s get back to data corruption for now.

Some file formats are safer from this kind of damage than others. Whole files don’t usually get overwritten; generally it’s just a few bits that go wrong. With some formats, a flipped bit in the wrong place makes the whole file unreadable. This is true of most compressed file formats, such as ZIP and GZIP. It’s also often true of image formats. But if a text or HTML file suffers this kind of damage, you’ll probably just lose a character or two. Other formats, like XML, might go bad but be easy to repair by visual inspection.

The moments of biggest risk are when you do a major software upgrade and when you move to a new computer. Rewriting the amount of the disk that a big installation requires, or copying everything to a new drive, can make latent problems become disasters. You do back up before doing things like those, right? But beyond that, take some time to make sure everything you need is still there before throwing out the old machine or backup.

There are a number of ways to reduce the likelihood of file damage.

  1. Do regular backups, preferably with multiple copies.
  2. Don’t work with a file on a network drive. Edit it on a local drive, then copy it to the network volume. Likewise for flash drives (which you shouldn’t consider reliable primary storage anyway).
  3. Avoid working on a disk which is nearly full.
  4. Shut down your computer properly. Don’t force-quit applications unless you absolutely have to; and if you do, check as soon as possible whether you need to use the backup file.
  5. Run a disk verification utility periodically, letting it repair the disk if necessary.
  6. Don’t let malware onto your computer.

There’s a tradeoff between redundancy and safety against damage. If you save compressed archives of whole directories, you might lose everything in the directory. If you save them uncompressed, your chances are better of just losing individual files (though directory structures and whole volumes can be damaged too). With today’s huge storage volumes, compression isn’t as important as it once was.

Sometimes I do follow my own advice. After writing all of the above, I realized I hadn’t verified my own main drive for quite a long time. I ran the Mac OS Disk Utility and verified the disk, and there were indeed errors! By rebooting from the installation DVD and running Disk Utility from there, I was able to repair the problems. (But what do you do if your computer company has started shipping its operating systems on the cheap and not providing installation discs any more — like OS X Lion?)

Suggestions: Follow the steps above as much as you can. If you’re an admin, resist pressure to throw away “redundant” backups. If you get reports of files that have turned into garbage, think about the possibility of file structure damage.

Links: