/
Preservation

Preservation

Institutional Overview

We propose to use the University of Ilinois Library's existing Medusa digital preservation and management system to preserve and provide long-term access to our datasets. Medusa is managed by the University Library's preservation librarian and is the default repository for digitization and born-digital materials produced at the University Library (Medusa FAQ, 2016). The University Library itself is a stable entity and widely regarded as one of the most significant academic research libraries in the world. The current University Librarian, John Wilkin, is widely regarded in the digital library community and continues to emphasize the need for robust digital solutions (Wilkin, 2016). We believe this makes the University Library an ideal storage venue with low long-term risk for preservation.

Collection Overview

The collection will consist of datasets either submitted by individuals or sourced and curated by project staff. The preservation plan focuses on providing a fault-tolerant data and metadata store that will allow for the implementation of BioCASe web-services to surface data for users. We anticipate relatively slow growth in the collection, perhaps numbering in the tens at first and increasing in size thereafter.

Intellectual Property Overview

We will ask contributors to sign a deed of deposit when uploading their materials or be clear to ingest materials that have already been committed to the public domain. Our terms of service will require contributors deed the collection to the public domain for fair use and also emphasize the growing requirements stemming from the White House Office of Science and Technology Mandate to make government-supported research data freely available online (OSTP, 2013).

Preservation Recommendations

  1. Establish Medusa as the primary repository for the collection
  2. Use middleware to connect Medusa preservation data to BioCASe web-services for public viewing

Preservation Infrastructure Overview and Workflow

Dataset preservation is challenging because different kinds of datasets inherently contain different kinds of data. Our suggestion will be that we injest primarily flattened, plaintext comma-delimited datasets. These will have the best chance for long-term preservation. However, we also allow for more complex data. 

  1. For tabular, plaintext, data, contributors can directly insert XML headers containing required metadata (see Organization section). This allows a single file to be submitted and enhances readability
  2. For complex datasets containing images or complete database files that may have strong computational dependencies, data can be presented in a zip archive with required metadata in a README.xml file. We will likely also require contributors to provide additional metadata describing the computational dependencies. 

In a typical submission, the user would upload the file to a webform and go through a series of steps to add and verify metadata. Next, the file would be sent to Medusa for ingest. Medusa would do its magic, creating preservation metadata and cryptographic hash values for the file. Medusa then sends this data to its storage function. From there the data are fed through middleware to BioCASe for display to content consumers.

Related content

Collections
More like this
Organization
Organization
More like this