Content area
Full Text
Keywords
Databases, Standards, Digital storage
Abstract
This article outlines past and present practice in the long-term preservation of databases. It describes one typology of databases affecting preservation methods. It also covers some outstanding challenges to database preservation and gives pointers to further reading on database preservation activity around the world.
Electronic access
The Emerald Research Register for this journal is available at www.emeraldinsight.com/researchregister
The current issue and full text archive of this journal is available at www.emeraldinsight.com/0305-5728.htm
Introduction
This article is intended to provide an introduction to some of the concepts underpinning the longterm preservation of databases and to provide a guide to the issues currently facing practitioners in the field.
People have been preserving databases for almost as long as we have been using computers. The use of computers to manage large quantities of structured data was one of the earliest widespread applications of them, following on shortly after their use for code-breaking and weapons targeting. And the preservation of structured information predates the computer by many centuries, whether that information be in index cards, registers or paper forms. Yet a number of approaches have emerged for preserving databases, for describing what has been preserved, and for providing access to the material thus preserved. Each approach has certain strengths and weaknesses; the different approaches exist partly because of different motivations for preserving the data in the first place.
In this discussion, I am using "database" to refer to any collection of structured information. Most commonly, this takes the familiar form of a moreor-less relational database, in which one or more rectangular arrays ("tables") of information exist with relationships between them. The cells of these arrays once typically contained numerical, textual or coded information, but may also contain pictures, sound or other multimedia entities, or indeed anything that is capable of being represented as a bitstream. Older databases often used other representations of information, but are capable of being represented in a relational form. Newer data collections, particularly those originating in the sciences, often use non-relational forms also. But the majority of the issues I describe are independent of these considerations.
Much of what I say is informed by the work we undertake under contract to The National Archives in running - the National...