Difference between revisions of "Paleoclimate Data Standards"
(Initial version, based on notes sent by Bronwen Konecky on July 27, 2016) |
(→Essential metadata) |
||
Line 24: | Line 24: | ||
For existing datasets, we deemed the following essential: | For existing datasets, we deemed the following essential: | ||
− | + | * at least two columns, one representing time, the other a climate indicator of some sort | |
− | + | * '''Geolocation''' : coordinates, polygons, or, in cases where coordinates or polygons cannot be given, general location info) | |
− | + | * '''Source''': PI, contributor, or database (first author of publication if published, or some other person who can speak for the dataset if not published/if the first author is no longer in science/etc) | |
+ | * Units of variables in dataset | ||
== Required Metadata == | == Required Metadata == |
Revision as of 05:15, 3 August 2016
A key objective of LinkedEarth is to promote the development of a community standard for paleoclimate data and metadata.
This page aims to summarize initial discussions from the June 22-23 PDS workshop on the subject of data and metadata standards. Notes came chiefly from Bronwen Konecky and Wendy Gross.
The Big Idea
While more (meta)data seems universally better, workshop participants identified the necessity to distinguish a set of essential, recommended and desired properties for each dataset. A consensus emerged that the definition of these levels should be archive-specific, as what is needed to intelligently re-use a marine-annually resolved record could be quite different than what is needed to intelligently re-use an ice core record, for instance. It was decided that archive-centric Working Groups would be best positioned to elaborate and discuss the components of a data standard for their specific sub-field of paleoclimatology.
Essential metadata
The conversation was guided by the following questions:
- What does “essential” mean (vs. highly desirable)?
- Who is the target audience/who are the end-users?
- What are the scientific end-goals?
For #1, we determined that:
- "essential" = data cannot be uploaded without it (the dataset would be utterly useless without any of this information missing)
- "Essential metadata" should mean something different for legacy datasets vs. new datasets
- "Essential" may vary by archive type
Overall, it was decided that separate "essential metadata" criteria needed to be applied to existing datasets vs. new datasets. In other words, the paleoclimate community should adopt stricter standards for what is essential in upcoming datasets, rather than being limited by the realities of old datasets.
For existing datasets, we deemed the following essential:
- at least two columns, one representing time, the other a climate indicator of some sort
- Geolocation : coordinates, polygons, or, in cases where coordinates or polygons cannot be given, general location info)
- Source: PI, contributor, or database (first author of publication if published, or some other person who can speak for the dataset if not published/if the first author is no longer in science/etc)
- Units of variables in dataset
Required Metadata
Moving forward, we felt the paleodata community should adopt the following guidelines for what is 'essential':
- All the above criteria, plus:
- measured material
- Archive type
- uncertainty on measured variables (we did not get a chance to hash out what essential 'uncertainty' metadata would be)
- Depth AND age
- age control points and other relevant age model info
Desired metadata
- age-uncertain ensembles (realizations of the timeseries X(t) for different age model paths)
- calibration ensembles (e.g. posterior draws from a Bayesian temperature calibration)