It isn’t Tuesday and this isn’t Belgium, but I’ve just learned about a risk to people’s online photo collections on LiveJournal, so it’s a topic appropriate to this blog and something that shouldn’t wait.

Update: LiveJournal has just put out a public announcement.

LiveJournal has an online photo album feature called “ScrapBook.” The most generous description of it is “serviceable.” It lets users who have Plus or paid accounts upload photographs and organize them in galleries.

Recently LiveJournal announced — on an obscure forum, in Russian — that Scrapbook would be “upgraded.” There have been reports flying which I can’t confirm, not being able to read Russian, But there is now an announcement in English, though still with a Russian page title. The migration will be clumsy:

Once this update is deployed, it will no longer be possible for both old and new ScrapBook to co-exist for the same user. It will also not be possible to access new ScrapBook until your images from the old one have been migrated to it. This means that until your ScrapBook is migrated, you will not see the “Add Image” button on entry create/edit pages.

The only change that’s explained is a downgrade in custom security options to all-or-nothing. You may be able to restore custom settings by going through your whole collection manually afterward.

During the migration, images that were previously public will remain public. However, images that had any ‘friends only’ security level of any kind will be migrated as private images.

How do you initiate migration (leaving aside the question of why you’d want to)? You have your choice of asking LJ to do it on its own schedule or waiting for LJ to do it on its own schedule.

If you want to migrate your photos to new ScrapBook now (rather than waiting until next week for the mass migration), please leave the comment “+” to this entry.

Will links to the old Scrapbook break? It’s not clear. LJ says that “all the photos you have already uploaded will still remain visible in all entries and comments,” but it doesn’t say whether URLs will remain valid or LiveJournal entries and comments will somehow be fixed up. In any case, friends-only pictures will become invisible.

If you have a LiveJournal account, make sure that you wouldn’t lose any photos if they disappeared from Scrapbook. This is good advice at all times but especially now. If you have a LiveJournal account, please spread the word. I certainly wouldn’t object to a link to this post. :)

Preservation Week

I was in Canada this past weekend, so this Tuesday’s post is just a link to a Library of Congress video on “Why digital preservation is important for you”, in honor of Preservation Week.

The challenge of too much digital stuff

Yes, you want to save all those important family photographs. There’s just one problem: They’re mixed in with thousands of pictures. Which ones do you really want to keep? Should you pick out the important stuff and save it in a collection that gets special attention? That can be a lot of work. Should you toss everything onto a terabyte drive? That saves effort, but will your grandchildren bother to go through it for the pictures that are worth remembering? Are there pictures in that big pile that you’d much rather they didn’t see?

The Library of Congress offers some advice which sounds useful:

Identify where you have digital photos

  • Identify all your digital photos on cameras, computers and removable media such as memory cards.
  • Include your photos on the Web.

Decide which photos are most important

  • Pick the images you feel are especially important.
  • You can pick a few photos or many.
  • If there are multiple versions of an important photo, save the one with highest quality.

That’s a reasonable agenda for a full-time librarian. For those of us who have other things to do, it’s a daunting challenge. It’s not unusual to have a few thousand digital photographs lying around and very little organization about them. The only clues you may have about their content and context are the file date (which can be wrong) whatever you can gather by looking at the picture. Some of my oldest digital photographs, from 2002 or earlier, have no metadata beyond the digital characteristics of the picture; they don’t say when they were taken or even on what kind of camera. The files have names like DSCN0128.JPG. The file with that name, incidentally, is dated “Jan 1, 2001 12:00 AM,” and it’s not from a New Year’s party. I remember that party. We toasted the New Millennium to Also Sprach Zarathustra. The picture isn’t from there.

But I digress. The point is, you have lots of pictures and don’t necessarily know a lot about them. Storage is cheap, and time isn’t. The best strategy may be a culling strategy; not “pick the images you feel are especially important,” but “spot the ones that are plain junk and get rid of them.”

While you’re doing that, try to give the pictures some sort of organization. This is best done on an ongoing basis rather than in one desperate plunge. My approach is to have a bunch of folders with descriptive titles, by geographic area or event or whatever. A lot of the folders have subfolders; I have a “Cons” folder with subfolders by convention name, most of those with subfolders by year. I drag pictures from the camera import folder to an appropriate folder. The ones that don’t get dragged out of the import folder eventually get deleted. Sometimes I rename the files to something descriptive and add metadata; sometimes I don’t. The result is a semi-organized collection of pictures with some stuff which other people might find interesting in years to come. Since I’m the kind of person who writes about file preservation, I have occasional bursts of fanaticism that let me improve the organization a bit.

To one degree or another, that’s the approach which makes sense for most people. Try not to lose anything important. Keep it as organized as you can on an ongoing basis. Don’t make an unnecessary effort to live up to the standards of people who work for libraries (like me). A good management tool (e.g., Adobe Bridge but not iPhoto) is a big help. It’s much easier to maintain good preservation habits than to tackle a huge organization project.

Recovering encrypted files

Few things in digital preservation are as frustrating as finding out you can’t open an encrypted archive. It must have been something important, and now you can’t get at it at all! You might not even have a clue what the file is.

The number one cause of this situation is stupidity. Back when I was doing consulting work, I’d sometimes send my project on a disk. (This was before the Internet and high-bandwidth data connections.) As a safety measure I encrypted it and sent my client the password separately, with a strong reminder to decrypt the disk immediately on receipt.

In one of those cases, I got back a reply months later saying, “We got this disk from you a few months ago and we can’t figure out what’s on it.” I hadn’t kept the password around, so I couldn’t help them. All I could do was send them another disk. They didn’t get around to decrypting that one either.

If you’ve received an encrypted archive and a password for it, do one of two things right away: Either decrypt it and store the extracted data in a safe place, or store the password where you’re sure you won’t lose it.

Another way to lose encrypted data is to find that you have the archive and the decryption key, but no working software to do the decryption. Perhaps software using some obscure homemade scheme created the archive. The encryption is doubtless second-rate and has serious theoretical weaknesses, but that’s not much help unless you’re a topnotch cryptanalyst. Or you may just not be able to tell what encryption scheme was used; one collection of random-seeming bits looks a lot like another.

Finally, encrypted files are fragile. Accidentally changing one bit will usually make the whole file or a large chunk of it undecipherable.

Sometimes the first challenge is to find out what kind of software to use. FI Tools from Forensic Innovations claims to be able to identify encrypted file types. The file extension may give a clue; lists over 300 of these.

Why so many? For one thing, in the nineties the US government put severe restrictions on the strength of encryption that published software, especially for export, could use. This did little to keep terrorists from using strong encryption, but it encouraged people to roll their own encryption methods. Finding software for some of these could be very tough today.

On the positive side, you may be able to break old archives created with those feeble algorithms just by throwing computational power at them. The once popular DES encryption used 56-bit keys; it’s possible to crack a DES archive on a modern computer in a matter of hours.

You have to be careful when looking for encryption-breaking software. A large part of the market is for espionage and data theft, and the people who sell to this segment aren’t the most trustworthy.

As always, preventing the problem is far easier than curing it. Keep good track of encrypted archives, store the decryption keys securely, don’t let the only person who knows the password quit, and make sure decryption software is still available.

Saving Internet information from the Memory Hole

Before releasing the name of Sergeant Robert Bales, who’s accused of a murderous rampage in Afghanistan, the US military tried to wipe information about him from the Internet. Since this is a tech blog, I won’t be talking here about why they did it or whether it was a good idea. The questions I’m addressing here are: (1) If you want to wipe some information from the Internet, can you do it? (2) If someone tries to wipe information which you suddenly realize you want, how much can you recover?

We’re talking here about the government’s deleting only information which it directly controls. Parts of Bales’ wife’s blog disappeared, but probably this happened with her cooperation. If a government can control all websites in the country, including search engines, and restrict access to those outside, it’s a very different game. Think of China and the Tienanmen Square events of 1989.

If the government hadn’t been so rushed for time, it might have done a much more effective job. Keeping Bales’ name out of the media for a week was probably pushing their limits. If they could have had another week, many search engine caches could have lapsed, making it harder but still far from impossible to find old pages.

Let’s suppose information on someone or something has been sent down the Internet Memory Hole, and you’re an investigative reporter who wants it back. How would you do it? If you do a search on Google, many of the hits will have “cached” links. This lets you look at Google’s latest cached version of the page, which may be the only available version or may have information that was recently taken out. That technique is good for information that’s not more than a few days old.
Screen shot from Google showing "cached" link
For older information, you can look at the Internet Archive’s Wayback Machine. This site has a vast collection of old Web pages, but it’s still necessarily spotty. Usually it has only pages more than six months old, and anyone can ask not to have their pages archived.

If you looked at a page a few days ago and it’s no longer there, you may have a copy cached by your browser. Going into offline mode may improve your chances of seeing the cached page rather than a 404.

There may be copies of the vanished material elsewhere on the Internet. Some people fish out cached pages that have disappeared and post them, especially if they think the deletion was an attempt to hide something. I did this myself a few years ago; in 2004 John Kerry proposed mandatory service for high school students, making it illegal for them to graduate if they didn’t satisfy a federal service requirement. This stirred up a lot of anger and the page proposing it disappeared from his website. I grabbed a cached copy from Google and posted it. That copy greatly increased my site’s hit counts for a while. If you can remember key phrases from the delete page, using them in a search string may turn up copies.

There’s a Firefox add-on called “Resurrect Pages.” It offers to search several caches and mirrors if you get a 404 error. Another one is “ErrorZilla Plus.” I don’t have any experience with them.

Finding vanished information on the Internet is an art, and doubtless there are experts who know a lot more tricks than I’ve mentioned here.


