Software
The databases will be organized using The BioCASe Provider Software. As advised by the preservation plan, this is 'an XML data binding middleware for publishing data from a relational database to an information network'. After installing BioCASe and configuring it for our given database, the published information will be accessible as a BioCASe web service, which means it can be retrieved with BioCASe protocol requests. The Provider Software is 'agnostic of the data model used for data publication and can be used in conjunction with any conceptual schema'. The core component is the 'PyWrapper, an XML/CGI database interface written in Python that allows a standardized access to a variety of database management systems and arbitrarily structured databases'.
This tool was chosen because 'even though BioCASe can be used for any conceptual XML data schema, its main field of application is the publication of occurrence data from specimen or observational databases to primary biodiversity information networks such as the Biological Collection Access Service BioCASe network, a transnational network of primary biodiversity repositories'. It links together 'specimen data from natural history collections, botanical/zoological gardens and research institutions worldwide with information from huge observation databases.' The data will also appear on the Global Biodiversity Information Facility, an 'international open data infrastructure, funded by governments that allows anyone, anywhere to access data about all types of life on Earth, shared across national boundaries via the Internet.'
Metadata
The following metadata scheme will be used.
- Dataset_Key
- Title
- Provider
- Description
- Date Uploaded
- Publish Date
- Dataset Purpose
- Methods of data capture
- Geographical coverage
- Geographical location
- Temporal Coverage
- Data quality
- Data Storage type
- File Size
- Number of Records
- Number of Species
- Additional Information
- Papers written as a result
- Link/URL
- Pictures
- Keywords
There will be NULL values, as each dataset does not have have all of this info at origin, but will strive to maintain these standards.
This schema was devised as a
------
*We're expecting datasets to come in varioius formats, but we want a preservation copy and an accessible copy (encoded in xml).