Difference between revisions of "Creating a LiPD file"

From Linked Earth Wiki
Jump to: navigation, search
m (General Instructions: fix type)
(Step-by-step Instructions: Start metadata section)
Line 111: Line 111:
  
 
'''Note:''' The dataset used for the instructions is a dummy dataset. None of the values were measured.
 
'''Note:''' The dataset used for the instructions is a dummy dataset. None of the values were measured.
 +
 +
Remember all the fields <span style="color:#ff0000"> in red </span> are mandatory.
 +
 +
===== Metadata =====
 +
 +
[[File:LiPD ExcelTemplate Metadata.png|thumb|400px|right|Example of metadata for a fictional dataset]]
 +
 +
[[File:LiPD ExcelTemplate ArchiveType ScrollDownMenu.png|thumb|400px|right|Scroll-down menu for the archiveType]]
 +
 +
The Metadata sheet contains the metadata pertaining to the entire dataset.
 +
 +
*''Dataset Name'': The standard notation used on the LinkedEarth wiki is siteName.firstAuthor.year.
 +
*''ArchiveType'': The type of [[:Category:ProxyArchive © | proxy archive]] on which the measurements were made. This automatically set the [[:Category:ProxyArchive ©]] to the proper type.
 +
*''Original Source_URL'': If the data is also stored on [https://www.ncdc.noaa.gov/data-access/paleoclimatology-data/datasets NOAA], [https://www.pangaea.de PANGEA], or with the original publication, enter the URL
 +
*''Investigators'': This corresponds to the [[:Property:Contributor © | contributors]] on the wiki. Enter the name of anyone who has contributed to the creation of the dataset, including the [[:Property:author © | authors]] on the [[:Category:Publication © | publication]] or lab technicians involved in the study.
 +
*''Publication Section'':
 +
**''Authors'': The [[:Property:author © | authors]] of the [[:Category:Publication © | publication]].
 +
**''Publication title'': The [[:Property:title © | title]] of the [[:Category:Publication © | publication]].
 +
**''Journal'': The [[:Property:journal © | journal title]] in which the [[:Category:Publication © | publication]] appears.
 +
**''Year'': The [[:Property:year © | year]] the [[:Category:Publication © | publication]] was published. Can be different from the [[:Property:datasetDate © | year]] in which the dataset was published/created.
 +
**''Volume'': The [[:Property:volume © | volume]] in the [[:Category:Publication © | publication]]
  
 
=== Converting to LiPD ===
 
=== Converting to LiPD ===

Revision as of 17:53, 11 April 2017

The most straightforward way to upload a dataset onto the wiki is to first create a LiPD file and upload it directly.

What is LiPD?

LiPD (Linked Paleo Data) is a convenient way to store and exchange paleoclimate data format and provides the backbone of the LinkedEarth edifice. LiPD is closely aligned with the LinkedEarth Ontology; changes in one are mirrored in the other.

How to read a LiPD file?

LiPD was designed so that is can capture much richer sets of (meta)data than ASCII or Excel files and to have a fixed backbone around which scientific codes can be built. There is a price to pay for this power: LiPD is undoubtedly more difficult to interact with than a plain text file. Although it is possible to unzip a LiPD file and navigate through the native JSON-LD and csv files, this not the best way to harness the power of LiPD files.

The easiest way to interact with a LiPD file is by using this very wiki, which allows you to navigate the hierarchical structure of the file easily.

In addition, we have developed several utilities to read and write LiPD files in Matlab, Python, and R.

What can I do with a LiPD file?

LiPD was designed to facilitate coding around paleoclimate data. We have already developed software in R and Python to analyze and visualize paleoclimate data:

In addition, CSciBox (an integrated system for age-model reconstruction) makes use of LiPD.

How do I get my data into LiPD?

As of April 2017, the most efficient way to get you paleoclimate dataset in LiPD format is to fill out our template (File:LiPDv1.2 template.xlsx) and use the Python LiPD Utilities to convert the template into a LiPD file. Make sure you are using the latest version of the template for compatibility.

By the end of 2017, a web-based interface should be able to automate a lot of the manual steps.

General Guidelines

What goes into a LiPD file?

This is a trickier question than it appears at first. Consider two extremes: (1) every little data table could have its own LiPD file; (2) we could try and squeeze all the paleo data generated thus far into one giant LiPD file. Where is the happy medium? There are two ways to think about this:

Study Level

All data and metadata that are part of the same study should be placed in the same LiPD file. There are exceptions to this rule of thumb. For instance, if the study involves two physical samples in drastically different locations (i.e., different regimes), then each physical sample and associated data and metadata should be placed in separate LiPD files. In other words, if the data from each specific physical sample can be reused on their own in another study, then each should be placed in its own LiPD files.

Signal Level

All the paleo data recording the same environmental signal (ii.e. having the same Category:Interpretation_©). Again, there are exception, such as studies done at the same site by different groups and very different points in time. Follow-up studies where one investigator goes back to the same site to expand the dataset (e.g. longer core/higher resolution sampling) probably warrants a new LiPD file, unless the results don't lead to any science, in which case it qualifies more as a "replication" study.

Examples:


All data and metadata should be in the same file for the following studies:

  • Lake cores from the same lake
  • Speleothems from the same cave
  • Ice cores from the same hole
  • Marine sediments from the same hole (IODP), same location (multi-core, piston core/gravity cores)
  • Corals from the same head
  • Trees from the same geographical region
  • Lake cores from different lakes but with the same climate interpretation. For instance, a regional composite.
  • Speleothems from different caves with the same climatology

Data and metadata should be in different files for the following studies:

  • Speleothems from different caves in different monsoon regimes
  • Lake cores from different lakes with different catchment basins
  • Marine sediments with different oceanographic regimes
  • Corals from different islands.

On the whole: there are no hard and fast rules, and feedback is welcome.

What constitutes a measurement table?

Simply put, one table/physical sample. So if a study uses two speleothems, the measurements for each sample should be reported in two different tables.

A good rule of thumb is to ask: How is the data going to be reused? For instance, if radiocarbon chronologies for different cores are meant to be independent of each other, then each physical sample should get their own measurement table. On the other hand, if a composite depth is used, then the measurements for each physical sample can be placed in the same table.

Excel Template

Download the template

Downloading the Excel LiPD Template

Compatible with LiPD version 1.2: File:LiPDv1.2 template.xlsx.

Right-click on the name of the file and select 'Download Linked File'.

General Instructions

The template has three sheets: Metadata, paleo1measurementTable1, chron1measurementTable1. The sheet named "list" contains ontology information and should not be edited.

There should only be one Metadata sheet/dataset!

If you need additional measurement tables, create new sheets by copying the content from paleo1measurementTable1 to new sheet(s) and name them paleo1measurementTable2, paleo1measurementTable3,... or chron1measurementTable1, chron1measurementTable2,...

Example of a yellow pop-up in the LiPD Excel Template

All the fields in red are mandatory for a LiDP file to be valid. if you're unsure how to answer a question, click on the cell and a yellow pop-up will appear with directions. All the terms used in the Excel template have formal definitions that can be found on this wiki. Use the search bar to access a definition of a term and examples on how the term was used.

Example of a drop-down menu in the LiPD Excel template

Some of the field are drop-down menu options:

  1. You may be required to choose something already on the list (e.g., variableType).
  2. In some instances, you can add your answer if it doesn't have an option (e.g., a new type of proxy observations).

If a dataset only contains inferred variables:

To make the data reusable by the community, we strongly encourage you to enter your raw measurements (Category:MeasuredVariable ©) along with its interpretation (Category:InferredVariable ©). However, we are aware that this may not always be possible. For instance, when transforming a legacy dataset into LiPD format, the raw measurements may not be readily available. However, the LinkedEarth wiki (and LiPD) requires a type of archive (e.g. marine sediment). On the wiki, the type of archive is only accessible through a Category:MeasuredVariable ©.

You may wonder why that is. After all, both Category:MeasuredVariable © and Category:InferredVariable © are a type of Category:Variable ©. However, remember that the LinkedEarth Ontology is designed to describe the relationship among the various categories. A measured variable is measured on the archive while the inferred variable is inferred from a measured variable.

Therefore, one needs to create the measured variable (a dummy one with no values if necessary) on the wiki.

Let's use a practical example. If the dataset you're working with only contains Sea Surface Temperature values and not the associated Sr/Ca data that the temperature inferred from, then create another column filled with the missing value flag for the datasets, using the Sr/Ca header. In the mediate section, only fill out the name, variableType ( measured or inferred), and ProxyObservationType for the variable (in this case, Sr/Ca).

If your table contains more than 14 columns, you can inset the corresponding lines for the metadata. Make sure you copy and paste the formulas from the previous lines!

Fill in as many fields of the template as possible. Future generations of researchers will thank you!

Step-by-step Instructions

Note: The dataset used for the instructions is a dummy dataset. None of the values were measured.

Remember all the fields in red are mandatory.

Metadata
Example of metadata for a fictional dataset
Scroll-down menu for the archiveType

The Metadata sheet contains the metadata pertaining to the entire dataset.

Converting to LiPD

As of April 2017, the conversion to a LiPD file needs to be done in Python (a free, open-source computing language).

Installing the Python LiPD utilities

In a terminal window, type:
 pip install lipd 

For more information about how to use the utilities, visit the GitHub page.

Running the Python LiPD utilities

Open your favorite Python interface (we recommend the use of Spyder, which comes with the Anaconda Python release) and type

#Import the package
import LiPD  
#The following command will trigger a GUI to navigate to the Excel file. If you know the path, you can enter it directly in the parenthesis (using quotes)        
lipd.readExcel()    
#Create your LiPD file
lipd.excel()

#Validating your file.
#The following command trigger a GUI to navigate to the newly created LiPD file. If you know the path, you can enter it directly in the parenthesis (using quotes) 
lipd.readLiPD()
#The following command will validate your file to make sure that it's conformed to the LiPD requirements. If the validation step failed, make sure that all the fields in red have been completed.
lipd.validate()