A case study in website risk

Comments Off on A case study in website risk

In an earlier post, “Whose site is it anyway?” I discussed the risks to organizations of having their website’s eggs in one basket. Here’s a look at a situation that happened recently, omitting the names.

Organization X had a wiki for internal operations, hosted by a commercial company for a small annual fee. It had only one designated administrator, who lost interest in the organization. Well before the deadline, they were aware of the issue, and there was discussion of how to back it up. There’s an export function available, but only an administrator can use it. Short of that, there are solutions such as HTTrack, which can download the pages as HTML but not as editable wikitext.

Discussion happened. Not much else did. A few months later, logged-in users started seeing a warning that the account would expire in a matter of days. X contacted the hosting company, asking to transfer the ownership of the account, pointing out that the name of the wiki was their legally registered name. The company said (quite properly) that they couldn’t transfer it without appropriate legal procedures. It was registered by the administrator, not the organization.

Things got a bit frantic from there. The existing organizer was getting communications, but there were conflicting messages on just what he was being asked to do. Was he supposed to reassign the account? Was he supposed to start a legal transfer of ownership, meanwhile letting the account lapse? Was he supposed to renew it and get reimbursed? Was someone at least making a backup while the clock was ticking? If he was supposed to add administrators, who would they be, and could the same scenario happen again if they left?

Fortunately, this story has a happy ending. The administrator decided to just go ahead and renew the account, leaving concerns about reimbursement for later. The new admin account was an email alias on Organization X’s domain, with multiple people assigned to it. For the time being at least, they’re out of the hole. I hope they start doing backups, of course.

iPhoto vs. preservation

3 Comments

How does iPhoto fit into digital preservation? Sort of like Rick Santorum in an ACLU meeting or a Yankees fan in Fenway Park. There’s at least one thing you have to use it for, and that’s importing pictures from an iOS device. But using it for anything you want to keep is a seriously bad idea.

Take a look at this article on Apple’s forums. A user asks a perfectly good question: How do you back up your iPhoto albums? The answer: “There is no album directory/folder. All the album info is stored in a data file that only has info of the album names and pointers to the actual photo files.” Look under “Pictures” in your user directory and you’ll find a folder called “iPhoto.” This may contain one or more library packages. A “package” is an invention of Apple’s intended to hide clutter from the user, as with an application that encompasses hundreds of support files the user shouldn’t have to mess with. In this case, though, it’s hiding essential information from the user. Fortunately, you can right-click it (that’s control-click for both of the people who don’t have a two-button mouse) and select “Show Package Contents.” This will bring up a window showing the contents of the package (which is just a folder in disguise); it’s rather bewildering. You can also right-click on a thumbnail in iPhoto and select “Show File” to see that file directly in the Finder.

Screen shot of an iPhoto library window

Screen shot of an iPhoto library window

Note: A question has been raised about the screenshot, with a warning that relying on it may “lead to massive data loss.” See discussion here.

The package has a folder called “Data” which is an alias to another folder within the package; this contains your actual JPEG files, carefully hidden from you. It may also contain other packages; these represent other iPhoto albums, which aren’t even visible in the Finder. You might say they’re sub-albums, but iPhoto doesn’t show any hierarchical relationship.

As far as Adobe Bridge is concerned, the library package is a binary document which it doesn’t know how to open. This means you can’t so much as see a list of your iPhoto pictures with Bridge.

That may be just as well. According to advice Apple support forums, if you try to do anything with those files, you’re apt to confuse iPhoto hopelessly. It’s like a micromanaging boss; the more you assert yourself, the deeper the hole you’re digging yourself into.

iPhoto horror stories aren’t hard to find. Here are a few I located quickly by searching for “evil iPhoto” and “hate iPhoto”:

Interchange formats

2 Comments

Some file formats are good for long-term storage of files, because they’re likely to be usable for a long time. (“A long time” in computer terms means ten or twenty years; if you want files that really last, get a rock, a hammer, and a chisel.) These are preservation formats. There are also file formats which are good for moving files from one application to another. These are interchange formats. (See my last post, “Tied to an Application”, on why these are important.) The two have a lot of overlap but aren’t the same.

The two have things in common. Specifications should be publicly available. The format should represent the information without losing any. There shouldn’t be legal barriers to implementation.

The big difference is that interchange formats have to work right now. A format which is otherwise great doesn’t help most users if they can’t import it into a new application with available software. Interchange formats have to keep editing-related information. PDF is a great format for preservation, but try turning a PDF into an editable file. The results are usually disastrous if the file’s at all complex.

Is Microsoft Word a usable interchange format? Much as it makes me gag, I have to say yes in many cases. You can open Word files with quite a number of different applications. Someone got paid a lot to reverse-engineer the Word format, but the job has been done. A safer bet, though, might be to export from Word to ODF. The current version can do this natively, and the ODF Add-In on SourceForge claims to do it better. That way, whatever application is importing the file is following a published spec, leaving less room for bugs and other surprises.

RTF (Rich Text Format) is nominally an interchange format, but it’s actually a poor one. Its handling of character encoding is miserable and can result in garbled files when an application guesses wrong about a file’s encoding. It isn’t standardized.

Don’t count on any interchange format to give you exactly the same content with a new application. There will almost always be subtle difference in formatting from one application to another. Color profiles may be treated differently. Metadata might not be 100% preserved.

Don’t use JPEG for image interchange. Its lossy compression means there will be spillage along the way. TIFF is good for getting images from one application to another.

When you export a file, keep it in the original format as well, at least till you’re sure you’ve exported it to your satisfaction. If anything goes wrong, that leaves you a chance of exporting again with better tools or settings, or if all else fails of manually moving information over.

Tied to an application

3 Comments

There’s nothing like a task you do every three years to remind you how transitory software can be, and how easy it is to lose files because you can no longer run the software to open and edit the files. I was reminded of this twice over as I started on the creation of the songbook for Concertino (blatant plug), a filk music convention which is held every three years in Massachusetts. I’ve been using Finale Allegro in the past, with my last upgrade being to the 2005 edition. Allegro is no longer available, and Allegro 2005 doesn’t export to any format that other software can open for editing. I have a lot of music in Allegro, so this is a dangerous problem.

The makers of Finale list an application called PrintMusic, a name which sounds really minimal, on their website. It turns out to be pretty much a renamed version of Allegro; if any features are missing I haven’t noticed so far, and it can open Allegro files though it uses a different native format. At this point I’m using it to enter songs and don’t seem to have lost anything.

This is fortunate, since after running it, Allegro will no longer run. It says it’s missing a required font. So I’m pretty much forced into buying PrintMusic at this point, and I’ll have to convert my existing files before getting stuck. The good thing is that PrintMusic can export to MusicXML, which other notation applications understand, so I can avoid being trapped again as long as I remember to export all my files.

After plain old data loss, scenarios like this are the commonest way to lose files. You use an application that works nicely for creating some important files, ignore them for a while, and then find you can’t open them with it any more. An “upgrade” in the operating system or hardware may have broken the application. Or you may have gotten an “upgrade” to the application itself, making it incompatible with the old files. (This shouldn’t happen but does.) Or in a more circuitous route, you may have switched from Application A to Application B, which opened A’s files just fine, only to discover that the latest version of B dropped that feature.

Sometimes you’re careful when switching to a new application or version, not deleting the old one till you’re sure you don’t need it, only to discover that the installation blew away some part of your computing environment that you needed. Even restoring the old application from a backup may not help if you can’t reinstall it from scratch. (Still got that serial number from four years ago?)

The best way to avoid this problem is to export important files regularly to a format that other applications can use. Often there’s an interchange format used by many applications that do similar jobs. This means you should check before committing to an application whether it has a decent export capability. Ideally, this should be an interchange format which other applications can read and edit. The exported file may be missing some fine points in the native format, and the importing application might lose more information or not format the file in quite the same way, but it’s a lot better than losing everything.

Next best is to save the visible rendering to a safe format. PDF is often a good choice, and these days most applications that can print files can “print” to PDF. If possible, export to PDF/A. Sometimes it makes more sense to export to an image file, such as TIFF. Don’t export anything but continous-tone images to JPEG; other kinds of images just look bad with JPEG compression.

Picking a “format that isn’t prone to breaking” can be tricky. I have an application called “Checkbook”, which is a nice checking-account managing application, and it lets you export your data to QIF. This would be more valuable if Quicken hadn’t dropped QIF support back in 2005 in favor of OFX, which Checkbook can only import. There are converters from QIF to OFX, but it’s not the ideal situation.

Application reviewers don’t usually say much about exit strategies. With any application, it’s possible you’ll have to give it up someday and go to a different one to carry on the job, and you’ll want to keep using your old data. Reviewers should pay attention to applications’ export capabilities. We don’t know how useful any export format will be in the future, but an application that has one improves your chances of avoiding a cul-de-sac.