Paleoclimate Data Standards

From Linked Earth Wiki
Revision as of 23:28, 21 February 2017 by Jeg (Talk | contribs) (First workshop on paleoclimate data standards)

Jump to: navigation, search

Background

What is a standard?

EarthCube defines a standard as follows:

a public specification documenting some practice or technology that is adopted and used by a community. [..] There is a continuum starting with any documented practice in some community.  If lots of people use a particular documented practice it could be adopted as a best practice. If almost everyone uses some documented practice, then it is a de facto standard.

Notice the emphasis on community and on practice. If only person uses a technical specification, it's not a standard. If it's voted on but not applied in practice, it's worthless as well. Thus, the objective of this EarthCube activity is to propose a standard with broad community appeal and adoption.

Why do we need standards?

This is a bit like asking why we need water. Modern life would simply be unlivable without standards. Imagine having to use a separate browser for each web page your visit, or a separate power-transmission system for every appliance you use! You only have to travel to a country that uses a different electric plug than the one your computer and phone employ to appreciate what a nightmare that would be. In science, the ultimate objective of a standard to make data understandable by others (including machines), and the derived analyses reproducible. Thus, a key objective of LinkedEarth is to promote the development of a community standard for paleoclimate data. Indeed, despite some ad-hoc gatherings among communities of interest over many years, until recently there had never been a concerted effort to produce a standard applicable to all paleoclimate observations. Given the increased importance of synthesis work (e.g. PAGES2k, Shakun et al 2012, Marcott et al 2013, MARGO, others), it is increasingly important that a common solution be found.

Prior Work

LiPD

The Linked Paleo Data (LiPD) format embodies one part of this solution: it offers a container that can wrap tightly around a wide varieties of paleoclimate datasets, providing a vessel for paleoclimate content. Other formats, of course, would be acceptable; however, there is no viable alternative currently in existence, which is why Julien and Nick had to go through the trouble of inventing such a format. Another reason to adopt it is that there is a growing code ecosystem being developed around LiPD in Matlab, R and Python: the LiPD utilities that allow cross-walk between all kinds of commonly-used formats, GeoChronR for the analysis of time-uncertain paleo data, and Pyleoclim to visualize and analyze the data. Why not just stick with LiPD and call it a day, you ask? Well, LiPD's infinite flexibility is a double-edge sword: it can accommodate all manner of information, but that information may or may not align with community best practices. It is thus necessary for the community to decide on such practices. In other words, if LiDP provides a field-tested answer to the question: how should paleoclimate data be stored?, it says nothing about what should be stored: that decision is up to the community.

First workshop on paleoclimate data standards

The 2016 workshop on paleoclimate data standards (PDS workshop, for short) served as a stepping stone to initiate a broader process of community engagement and feedback elicitation, with the goal of generating such a community-vetted standard. The workshop identified a need to delineate a set of essential, recommended and desired properties for each dataset. By default, any and all information is desired. A subset of that should be recommended to ensure optimal re-use. Yet a smaller subset of that is essential in the sense that a paleoclimate data set should not be acceptable without this information (for more details, see the PDS workshop page). Three additional themes emerged:

Cross-Archive Standards

Some essential data/metadata are shared among all conceivable archive types:

  • A table with at least two columns, one representing time, the other a climate indicator of some sort
  • Geolocation : coordinates, polygons, or, in cases where coordinates or polygons cannot be given, general location info)
  • Source: PI, contributor, or database (first author of publication if published, or some other person who can speak for the dataset if not published/if the first author is no longer in science/etc)
  • Names and Units of the variables in the dataset

Archive-specific standards

What is needed to intelligently re-use a marine-annually resolved record could be quite different than what is needed to intelligently re-use an ice core record, for instance. Therefore, these these levels are archive-specific.

Legacy vs Modern datasets

The group also recognized that standards need to be more stringent for modern datasets than for legacy datasets, for which some (meta)data are sometimes impossible to procure (think: raw radiocarbon dates from a PI now deceased). Thus, for every archive and across archives, there needs to be different set of standards for both kinds. What constitutes "legacy" data is also open to interpretation, and requires a formal definition (and a vote).

Process for achieving a paleoclimate data standard

Attendees of the 2016 PDS workshop proposed that archive-centric working groups (WGs; self-assembled coalitions of knowledgeable experts) would be best positioned to elaborate and discuss the components of a data standard for their specific sub-field of paleoclimatology. It is also critical to ensure interoperability between standards to enable longitudinal (multiproxy) investigations.

This process contributes to the data stewardship initiative of our PAGES/Future Earth partners. Therefore, we are working together with PAGES to reach out to the broadest cross-section of paleoscientists and invite them to contribute to the process. The end goal is a standard to be precisely documented and adopted by LinkedEarth and PAGES. The standard will be implemented in all LinkedEarth activities and proposed for adoption by EarthCube, the Research Data Alliance, the Federation of Earth Science Information Partners, NOAA WDS-Paleo and Pangaea.

Working groups have been formed and are now being consulted to generate the backbone of a standard, which will be presented to the community at the PAGES OSM meeting in Zaragoza (May 9-13, 2017).

Standard Publication

Once the community has spoken on these matters, the decisions will be summarized in a publication.

A formal standard  is a specification of some practice that is adopted by a recognized standards body. The set of formal standards and set of de facto standards intersect, but are not the same; some formal standards are not very widely used. Nonetheless, because of the community participation and rigor required to formalize the standard we recognize that they merit careful evaluation. [1] 

In the internet age, a standard can be a web-based document that details all the specifications pertaining to a technical matter. However, to encourage participation and promote transparency, the LinkedEarth team decided that the standard should be published in a crowd-sourced peer-reviewed publication.

The writing process will likely take place in Authorea and synthesize the decisions taken, and the pertinent discussions that led to such decisions. Pursuant to PAGES policies, authorship will be extremely inclusive and acknowledge all scientific input into the process. Anyone contributing to the discussion on developing standards for paleoclimatology (either during the 2016 PDS workshop, on the wiki, or via teleconferences an in-person exchanges) will be included in the author list.