Before releasing the name of Sergeant Robert Bales, who’s accused of a murderous rampage in Afghanistan, the US military tried to wipe information about him from the Internet. Since this is a tech blog, I won’t be talking here about why they did it or whether it was a good idea. The questions I’m addressing here are: (1) If you want to wipe some information from the Internet, can you do it? (2) If someone tries to wipe information which you suddenly realize you want, how much can you recover?

We’re talking here about the government’s deleting only information which it directly controls. Parts of Bales’ wife’s blog disappeared, but probably this happened with her cooperation. If a government can control all websites in the country, including search engines, and restrict access to those outside, it’s a very different game. Think of China and the Tienanmen Square events of 1989.

If the government hadn’t been so rushed for time, it might have done a much more effective job. Keeping Bales’ name out of the media for a week was probably pushing their limits. If they could have had another week, many search engine caches could have lapsed, making it harder but still far from impossible to find old pages.

Let’s suppose information on someone or something has been sent down the Internet Memory Hole, and you’re an investigative reporter who wants it back. How would you do it? If you do a search on Google, many of the hits will have “cached” links. This lets you look at Google’s latest cached version of the page, which may be the only available version or may have information that was recently taken out. That technique is good for information that’s not more than a few days old.
Screen shot from Google showing "cached" link
For older information, you can look at the Internet Archive’s Wayback Machine. This site has a vast collection of old Web pages, but it’s still necessarily spotty. Usually it has only pages more than six months old, and anyone can ask not to have their pages archived.

If you looked at a page a few days ago and it’s no longer there, you may have a copy cached by your browser. Going into offline mode may improve your chances of seeing the cached page rather than a 404.

There may be copies of the vanished material elsewhere on the Internet. Some people fish out cached pages that have disappeared and post them, especially if they think the deletion was an attempt to hide something. I did this myself a few years ago; in 2004 John Kerry proposed mandatory service for high school students, making it illegal for them to graduate if they didn’t satisfy a federal service requirement. This stirred up a lot of anger and the page proposing it disappeared from his website. I grabbed a cached copy from Google and posted it. That copy greatly increased my site’s hit counts for a while. If you can remember key phrases from the delete page, using them in a search string may turn up copies.

There’s a Firefox add-on called “Resurrect Pages.” It offers to search several caches and mirrors if you get a 404 error. Another one is “ErrorZilla Plus.” I don’t have any experience with them.

Finding vanished information on the Internet is an art, and doubtless there are experts who know a lot more tricks than I’ve mentioned here.