WG completeness

From Linked Earth Wiki
Jump to: navigation, search
( Working Groups )


This Working Group is dedicated to the elaboration of a completeness score to quantify metadata completeness along a number of dimensions. This concept emerged out of the June 22-23 PDS workshop.

Group composition

Jack Williams, Kim Cobb, Kaleb Horlick, Julien Emile-Geay


  1. Motivation for PIs to submit maximally complete datasets
  2. Motivation for data rescue missions to complete legacy datasets
  3. Accountability for PIs
  4. Increase utility (value) of records in a database
  5. Scientifically enabling : enables to stratify the database according to the availability of certain types of information (e.g. calibration metadata)
  6. Promotes a discussion on quantitative estimates of “uncertainty” (broadly defined) for proxies
  7. Completeness is an imperfect, but reasonable proxy for data quality. That is, there is ample evidence to show that very complete data records originate from PIs who are careful scientists all around.

Basic parameters

  1. Objective, value-neutral
  2. only an assessment of completeness (not a ``quality, which is loaded and hard to define)
  3. archive-specific (need to represent structural uncertainties by proxy type (e.g. Trieste, 2008 white paper) examples include frequency-dependent calibration, divergence problem)

Basic Vision

Multiple scores from 1 to 5 along several dimensions result in a spider diagram. The dimensions of the spider diagram may be:

  • availability of native data (e.g. if temperature is inferred from Mc/Ca measurements, those are archived as well. For radiocarbon, availability of raw (uncorrected) dates
  • y-uncertainty (reproducibility, degree of replication, instrument precision)
  • x-uncertainty (number of chronological tie points, their precision; description of reservoir correction/assumption about initial concentrations of radioisotopes)
  • geolocation (geographic coordinates, their precision, other geolocation features)
  • completeness of entry (e.g. Bibliography, funding metadata)
  • interpretation – how do you know what you know? If applicable, link to a forward model
  • provenance (physical samples iGSNs, protocol, documentation of the various transformation/corrections applied)
  • more?


Start with an example of a 5 star dataset that is progressively degraded to 1 star