Difference between revisions of "Paleoclimate Data Standards"

From Linked Earth Wiki
Jump to: navigation, search
(Required Metadata: Rename the section to recommended)
(re-organized to make this more PDS, not just the PDS workshop)
Line 1: Line 1:
 +
= Background =
 +
Modern life would simply be unlivable without standards. You only have to travel to a country that uses a different electric plug shape than yours to understand this.
 +
 
A key objective of LinkedEarth is to promote the development of a community standard for paleoclimate data and metadata.
 
A key objective of LinkedEarth is to promote the development of a community standard for paleoclimate data and metadata.
  
This page aims to summarize initial discussions from the [http://linked.earth/event/paleoclimate-ontology-workshop/ June 22-23 PDS workshop] on the subject of data and metadata standards. Notes came chiefly from Bronwen Konecky and Wendy Gross.
 
 
== The Big Idea ==
 
 
While more (meta)data seems universally better, workshop participants identified the necessity to '''distinguish a set of essential, recommended and desired properties''' for each dataset. A consensus emerged that the definition of these levels should be archive-specific, as what is needed to intelligently re-use a marine-annually resolved record could be quite different than what is needed to intelligently re-use an ice core record, for instance. It was decided that archive-centric [[:Category:Working Group | working groups]] would be best positioned to elaborate and discuss the components of a data standard for their specific sub-field of paleoclimatology.
 
 
== Essential metadata ==
 
 
The conversation was guided by the following questions:
 
 
# What does “essential” mean (vs. highly desirable)?
 
# Who is the target audience/who are the end-users?
 
# What are the scientific end-goals?   
 
 
For #1, we determined that:
 
* "Essential" = data cannot be uploaded without it (the dataset would be utterly useless without any of this information missing)
 
* "Essential metadata" should mean something different for legacy datasets vs. new datasets
 
* "Essential" may vary by archive type
 
 
Overall, it was decided that separate "essential metadata" criteria needed to be applied to existing datasets vs. new datasets. In other words, the paleoclimate community should adopt stricter standards for what is essential in upcoming datasets, rather than being limited by the realities of old datasets.
 
 
For existing datasets, we deemed the following essential:
 
 
* '''A table with at least two columns''', one representing time, the other a climate indicator of some sort
 
* '''Geolocation''' : coordinates, polygons, or, in cases where coordinates or polygons cannot be given, general location info)
 
*  '''Source''': PI, contributor, or database (first author of publication if published, or some other person who can speak for the dataset if not published/if the first author is no longer in science/etc)
 
* '''Names''' and '''Units''' of the variables in the dataset
 
  
== Recommended Metadata ==
+
= Process =
Moving forward, the paleodata community should adopt the following guidelines for what is 'essential':
+
The work done on LiPD, which closely mirrors our ontology, provides a stepping stone for this effort. Building on this, the [[:Category:PDS workshop 2016 | 2016 workshop on paleoclimate data standards]]  served as a stepping stone to initiate a broader process of community engagement and feedback elicitation to generate a community-vetted standard. The workshop identified the necessity to distinguish a set of essential, recommended and desired properties for each dataset. A consensus emerged that these levels are archive-specific, as what is needed to intelligently re-use a marine-annually resolved record could be quite different than what is needed to intelligently re-use an ice core record, for instance. It was decided that archive-centric working groups (WGs; self-assembled coalitions of knowledgeable experts) would be best positioned to elaborate and discuss the components of a data standard for their specific sub-field of paleoclimatology. It is also critical to ensure interoperability between standards to enable longitudinal (multiproxy) investigations.
  
* All the above criteria, plus:
+
This process contributes to the data stewardship initiative of our PAGES/Future Earth partners. Therefore, we are working together with PAGES to reach out to the broadest cross-section of paleoscientists and invite them to contribute to the process. The end goal is a standard to be precisely documented and adopted by LinkedEarth and PAGES. The standard will be implemented in all LinkedEarth activities and proposed for adoption to EarthCube, the Research Data Alliance, the Federation of Earth Science Information Partners, NOAA WDS-Paleo and Pangaea.
* Archive type 
+
* measured material (i.e. what physical medium was the measurement made on)
+
* uncertainty on measured variables (at the very least, analytical uncertainty on raw measurements)
+
* Depth as well as age
+
* Native observations (e.g. Mg/Ca measurements, BEFORE calibration to temperature)
+
* age control points and other relevant age model info
+
  
== Desired metadata ==
+
= Publications =
  
* age-uncertain ensembles (realizations of the timeseries X(t) for different age model paths)
+
A scholarly product will be a peer-reviewed publication presenting the standard and detailing the decisions that led to it. Pursuant to PAGES policies, authorship will be extremely inclusive and acknowledge all scientific input into the process.
* calibration ensembles (e.g. posterior draws from a Bayesian temperature calibration)
+

Revision as of 16:37, 21 February 2017

Background

Modern life would simply be unlivable without standards. You only have to travel to a country that uses a different electric plug shape than yours to understand this.

A key objective of LinkedEarth is to promote the development of a community standard for paleoclimate data and metadata.


Process

The work done on LiPD, which closely mirrors our ontology, provides a stepping stone for this effort. Building on this, the 2016 workshop on paleoclimate data standards served as a stepping stone to initiate a broader process of community engagement and feedback elicitation to generate a community-vetted standard. The workshop identified the necessity to distinguish a set of essential, recommended and desired properties for each dataset. A consensus emerged that these levels are archive-specific, as what is needed to intelligently re-use a marine-annually resolved record could be quite different than what is needed to intelligently re-use an ice core record, for instance. It was decided that archive-centric working groups (WGs; self-assembled coalitions of knowledgeable experts) would be best positioned to elaborate and discuss the components of a data standard for their specific sub-field of paleoclimatology. It is also critical to ensure interoperability between standards to enable longitudinal (multiproxy) investigations.

This process contributes to the data stewardship initiative of our PAGES/Future Earth partners. Therefore, we are working together with PAGES to reach out to the broadest cross-section of paleoscientists and invite them to contribute to the process. The end goal is a standard to be precisely documented and adopted by LinkedEarth and PAGES. The standard will be implemented in all LinkedEarth activities and proposed for adoption to EarthCube, the Research Data Alliance, the Federation of Earth Science Information Partners, NOAA WDS-Paleo and Pangaea.

Publications

A scholarly product will be a peer-reviewed publication presenting the standard and detailing the decisions that led to it. Pursuant to PAGES policies, authorship will be extremely inclusive and acknowledge all scientific input into the process.