Files that Last goes to the proofreader tomorrow, but just today I came across a story that I wanted to add to it. Rather than mess with the existing text, I’m entering it as an appendix. Here it is, as it currently stands.

Just a day before this book is due to go to the proofreader, I’ve come across the story of a person who really exemplifies the term “preservation geek.” A story on Reason magazine’s website, “Amateur Beats Gov’t at Digitizing Newspapers: Tom Tryniski’s Weird, Wonderful Website,” tells us of a retired computer engineer, Tom Tryniski, who has digitized over 22 million newspapers, many dating back to the 19th century, and made them available on It’s a truly ugly, Flash-based website, but that’s not the point here. What the site shows is that high budgets and formal training in library science aren’t necessary to doing valuable preservation work.

Tryniski started by digitizing old postcards for neighbors in Fulton, New York. Then he spent a year digitizing the entire run of the Oswego Valley News by hand on a flatbed scanner. In 2003 he got a microfilm scanner at a fire sale and started getting microfilms of newspapers from libraries and historical societies in exchange for the digitized copies. He’s paying its own expenses, apparently less than $1000 a month. The setup sounds very fragile; he has a “server that’s located in a gazebo on his front deck,” and the article doesn’t say a word about offsite backup for his growing farm of computers and drives. If anything bad happens to him or his house, the whole archive might vanish.

What one person does, though, someone else can do better. It would take more money, but not a lot more, to set up a better server environment and a secondary backup, and it would just take a little taste and programming skill to set up a better-looking site.

The article raises the question of how much supporting metadata is needed:

Asked for the rationale behind this byzantine system, a spokesperson for the NEH denied that breaking up the funding into small grants drives up costs, adding that the goal is partially to teach small libraries how to digitize newspapers in accordance with the Library of Congress’ “high technical” standards. That way they’ll be able to take that know-how and apply it to other projects.

But [Brian] Hansen [the general manager of] says the Library of Congress’ detailed specifications for analyzing each newspaper page are of questionable value to users and a major reason his firm has to charge so much.

“Why not use the money for a lighter index to get more pages online? It would be interesting to sit down with the Library of Congress and the NEH and have a conversation about what’s the best thing we can do for consumers,” says Hansen.

Even so, less than one-third of the funding goes to the actual scanning and indexing by firms like iArchives. The NEH says the remaining money—more than $2 per newspaper page— goes for “identification and selection of the files to be digitized, metadata creation, cataloguing, reviewing files for quality control, and scholarship on the scope, content and significance on each digitized newspaper title, and in some cases specialized language expertise.”

Certainly there’s value in all that information, but it adds cost. The approach which the Library of Congress takes isn’t necessarily the approach that you should take as a Level 1 archivist with a server in a gazebo. Having a little information on a lot of newspaper pages is better in some ways than having a lot of information on relatively few pages.

There are high and low roads, and the efforts of eager amateurs can make a significant contribution to the retention of information. Preservation geeks, go forth and archive!