Abstract
The JISC funded eBank-UK project (two phases since Sept. 2003) [1], has investigated the linking of primary data to other research outputs, such as published papers, within the scholarly knowledge cycle [2]. Building on the concept of open access [3], the project has focussed on the laboratory based experimental technique of chemical crystallography at the UK National Crystallography Centre. It has constructed an institutional data repository (eCrystals) that makes available the raw, derived and results data from a crystallographic experiment [4]. Following the creation of a completed crystal structure, data is uploaded into a data repository and additional metadata (chemical as well as bibliographic) is associated with the dataset.
The objectives of eBank-UK: Phase 3 (Sept. 2006–June 2007) [5] aim to progress the establishment of a global federation of data repositories for crystallography: the “eCrystals Federation”, through a comprehensive feasibility scoping study. An important part of the study (this report) explores data curation and preservation issues as well as sustainability within a federation. Long-term sustainability of digital data requires, in the first instance, a policy commitment to undertake curation and preservation duties in maintaining the data so that it is usable (and reusable) for its useful lifetime. However, such a commitment is likely to be influenced by a whole host of factors including social, political, organisational, financial and technical. One way of assessing these factors in the context of the eCrystals data repository and federation is to consider the questions posed in the rapidly developing area of repository audit and certification.
Within the preservation community, the Reference Model for an Open Archival Information System (OAIS) (ISO 14721:2003 [14]), has established itself as an important standard, influencing: the development of preservation metadata; architectures and systems design of repositories; and conformance criteria for archival repositories. Although the OAIS standard covers a wide range of issues relating to the operating environment of an archive or repository, its concept of using Representation Information (RI) as a means of preserving access to the information content of digital objects is currently receiving significant attention [32, 34, 35]. Consequently, we devote a section to examining the RI of the content held in the eCrystals repository, in particular the variety of file formats in use.
Fundamental to preserving and curating digital information, is the recording of adequate and appropriate metadata. Whilst the exact metadata to be recorded is dependent on the specific preservation strategy in force, there is some consensus on a certain core set of preservation metadata (PREMIS Data Dictionary [53]). We therefore examine the implications of this on the eBank-UK Metadata Application Profile [46].
The capabilities and constraints of the software platform underlying a repository are critical to the functions and services that can be provided at the application level. With this in mind, we take a brief look at the ePrints.org software upon which the eCrystals data repository is constructed.
Given the exploratory nature of this report, we have tried to identify issues that are likely to impact on the long-term preservation, curation, maintenance and sustainability of crystallography data and in particular the eCrystals data repository. In order to progress this work and take it forward in the context of a federation we follow each major topic area with a set of recommendations, some of which have over-lapping scope.