Why? Why shouldn't URLs persist? Do LC shelf numbers for books change? Does the street address of your house or business change?
Back in 2001 I wrote an article for Library Journal and I listed a number of reasons why digital content disappears. Changing all your site's URLs just because you have a new director or you've chosen a new technology platform makes no sense to me.
Modes of digital death
The changing of the political guard is but a single cause of such disappearances. We can identify different modes of digital death:
- The New Replaces the Old: Change needn't be as dramatic as the transition of power of the U.S. presidency. Every time an organization has new information to put on a web site or CD-ROM or other digital format, there is a tendency to publish the new content and simply to overwrite or toss the old.
- Content Reorganization: Particularly in the web sphere, it is popular to reorganize content space periodically: when a new master is assigned to care for the content, when significant change in the content structure has occurred, or when people deem the content organizations "stale." Many an Error 404 stems from simple reorganization.
- Death of a Sponsor: When the sponsoring organization of a collection dies, so too may its digital content. The day the Al Gore campaign conceded the 2000 election, its web site went dark. Anyone hoping to analyze documents on that site was instantly deprived of access.
- Sponsor Loses Interest: Most traditional print publishers consider the "back file" to be an asset worth protecting. Web-based publishers seem to lack this long-term view. For instance, Internet World, a print publication with a companion web site, has published since 1994. Until recently, a complete archive of back issues appeared on the web site. Now, the archive extends back only to July 1999.
- Sponsor Fears History: In many cases, corporations may consciously avoid maintaining historical documents for fear of litigation. Ford, IBM, and other major companies have been sued in recent years for infractions alleged to have taken place over 50 years ago. Corporate document retention policies therefore tend to encourage disposal of documents. As corporate knowledge moves to the company intranet, corporations will face an increasing dilemma: to what extent are we protecting ourselves from potential litigation, and to what extent are we destroying corporate knowledge?
- Lost Functionality: The disappearance of www.pub.whitehouse.gov, an MIT-developed site that offered rich search functionality, exemplifies this particular form of digital death. While NARA has preserved the raw content, the NARA search interface pales in comparison to the MIT product. Similarly, in February 2001, Deja.com(formerly DejaNews.com), which provided a searchable archive of postings to Usenet news groups, was taken over by Google. While Google acquired all of the "intellectual assets" of Deja.com, it failed to preserve the search interface, breaking thousands of hyperlinks and making searches impossible for savvy Deja users. Google promises to restore the lost functionality in time; other takeovers may not be so sensitive to user demands.
- Media Format Obsolescence: Anyone who owns data stranded on a 51/4" floppy disk knows the impact of media format changes. A newer storage technology supplants your format of choice; unless you take steps to copy old content to newer media, your data become stranded.
- Content Format Obsolescence: Data stored in a proprietary format, such as an older version of WordPerfect or an obsolete tool such as PC Write, may be completely unreadable to current versions of software. Even web content could conceivably become obsolete, as the transition to newer generations of HTML or XML (or popular proprietary formats such as Flash) render old content unusable with newer software.
- Disaster: Whether a small-scale disaster (e.g., server meltdown) or a disaster that affects a campus (e.g., the 1994 earthquake that destroyed much of Cal State-Northridge), disaster can wipe out digital data whose sponsors have failed to provide for adequate offsite backup.