Organization

Software

The databases will be organized using The BioCASe Provider Software . As advised by the preservation plan, this is 'an XML data binding middleware for publishing data from a relational database to an information network'. After installing BioCASe and configuring it for our given database, the published information will be accessible as a BioCASe web service, which means it can be retrieved with BioCASe protocol requests. The Provider Software is 'agnostic of the data model used for data publication and can be used in conjunction with any conceptual schema'. The core component is the 'PyWrapper, an XML/CGI database interface written in Python that allows a standardized access to a variety of database management systems and arbitrarily structured databases'.

This tool was chosen because 'even though BioCASe can be used for any conceptual XML data schema, its main field of application is the publication of occurrence data from specimen or observational databases to primary biodiversity information networks such as the Biological Collection Access Service BioCASe network, a transnational network of primary biodiversity repositories'. It links together 'specimen data from natural history collections, botanical/zoological gardens and research institutions worldwide with information from huge observation databases.' The data will also appear on the Global Biodiversity Information Facility, an 'international open data infrastructure, funded by governments that allows anyone, anywhere to access data about all types of life on Earth, shared across national boundaries via the Internet.'

Metadata

The following metadata scheme will be used.

Dataset_Key
Title
Provider
Description
Date Uploaded
Publish Date
Dataset Purpose
Methods of data capture
Geographical coverage
Geographical location
Temporal Coverage
Data quality
Data Storage type
File Size
Number of Records
Number of Species
Additional Information
Papers written as a result
Link/URL
Pictures
Keywords

There will be NULL values, as each dataset does not have have all of this info at origin, but will strive to maintain these standards.

This schema was devised as a

------

*We're expecting datasets to come in varioius formats, but we want a preservation copy and an accessible copy (encoded in xml).