There is an ongoing misperception that simply storing and backing-up research data is equivalent to the conservation guaranteed by digital preservation. Thanks to a new awareness of the importance of research data, a new ‘data consciousness’ so to speak, it is now widely recognized that is important to generate regular back-ups in multiple locations (LOCKSS – Lots of Copies Keep Stuff Save). But simply storing data, even with multiple backups, is not enough to ensure the long-term survival of digital objects. Digital materials, which account for an increasing proportion of what humans produce as knowledge and culture, are very fragile, possibly even more fragile than physical objects.
Additionally, the amount of research data that institutions deal with on a daily basis (especially ‘born digital’ data, in other words, digitally generated data) makes it clear that simple storage and backup methods are not sufficient to ensure long-term access. Although backing up and saving multiple copies is part of the preservation process, digital preservation requires far more. Long-term preservation is a process that ensures the usability of data in spite of the technological changes that occur over time. Several aspects of the digital preservation process are presented here:
- Storage processes
Digital files and media can be damaged, for example if a file is corrupted it can no longer be opened. This is where backups are helpful, but only if they are part of a process. It is important not to blindly trust backup software, but to regularly check that the recovered files are completely identical those originally stored. To be sure that the files are the same and that all of them have been backed up, generating checksums for the stored files can be helpful.
- File formats
Not just corruption of files endanger digital materials. The technological landscape, including file formats, also change over time. No amount of copies of a file will help if the software to open them no longer exists. For long-term preservation, changes to the software and file formats must be taken into account and action taken to foresee problems before they occur. For example, identifying at-risk file formats and migrating them to more widely supported formats is part of a professional preservation process.
Metadata is important to locate data and be able to search for it in catalogues and databases. However, metadata is also important to give the record a context, because without good metadata the meaning of this record can be lost. If a dataset is stored, for example, without information on who created the dataset, what research it supports, and so on, then the dataset itself might be incomprehensible and useless. This information must be provided as close to the creation of the file as possible.
It is unpredictable which research data or materials from our digital cultural heritage will be used in the future. However, if it is properly managed and cared-for from the outset, its reuse is made possible and the value of that reuse increased.
Simple storage, even with multiple back-ups, is not comparable to long-term preservation, because:
- Files or storage media may be damaged, resulting in data loss.
- Data may become inaccessible as technology and software change.
- Files need metadata to be discoverable and searchable.
- Each type of file needs metadata (themes, titles, authors, etc.) to identify its context.
Adapted from Natalie Harrower and Kathryn Cassidy at the Digital Repository of Ireland.