Difference between revisions of "Creating a LiPD file"

( Pages with syntax highlighting errors )

Latest revision as of 20:52, 23 July 2018

The most straightforward way to upload a dataset onto the wiki is to first create a LiPD file and upload it directly.

What is LiPD?

How to read a LiPD file?

What can I do with a LiPD file?

How do I get my data into LiPD?

There are two ways to get your data into a LiPD file:

Using the "online lipdifier", which is the most straightforward way for simple entries not containing ensemble tables.
For datasets with ensemble tables, we recommend using the Excel template, adding the appropriate ensemble table sheets (File:LiPDv1.2 template.xlsx) and use the Python LiPD Utilities to convert the template into a LiPD file. Make sure you are using the latest version of the template for compatibility.

General Guidelines

What goes into a LiPD file?

This is a trickier question than it appears at first. Consider two extremes: (1) every little data table could have its own LiPD file; (2) we could try and squeeze all the paleo data generated thus far into one giant LiPD file. Where is the happy medium? There are two ways to think about this:

Study Level

All data and metadata that are part of the same study should be placed in the same LiPD file. There are exceptions to this rule of thumb. For instance, if the study involves two physical samples in drastically different locations (i.e., different regimes), then each physical sample and associated data and metadata should be placed in separate LiPD files. In other words, if the data from each specific physical sample can be reused on their own in another study, then each should be placed in its own LiPD files.

Signal Level

All the paleo data recording the same environmental signal (i.e. having the same Category:Interpretation_(L)). Again, there are exceptions, such as studies done at the same site by different groups and very different points in time. Follow-up studies where one investigator goes back to the same site to expand the dataset (e.g. longer core/higher resolution sampling) probably warrant a new LiPD file, unless the results don't lead to any science, in which case they might qualifies more as a "replication" study, and be included as a separate data table in the same LiPD file.

Examples:

All data and metadata should be in the same file for the following studies:

Lake cores from the same lake
Speleothems from the same cave
Ice cores from the same hole
Marine sediments from the same hole (IODP), same location (multi-core, piston core/gravity cores)
Corals from the same head
Trees from the same geographical region
Lake cores from different lakes but with the same climate interpretation. For instance, a regional composite.
Speleothems from different caves with the same climatology

Data and metadata should be in different files for the following studies:

Speleothems from different caves in different monsoon regimes
Lake cores from different lakes with different catchment basins
Marine sediments with different oceanographic regimes
Corals from different islands.

On the whole: there are no hard and fast rules, and feedback is welcome.

What constitutes a measurement table?

Simply put, one table per physical sample. So if a study uses two speleothems, the measurements for each sample should be reported in two different tables.

A good rule of thumb is to ask: How is the data going to be reused? For instance, if radiocarbon chronologies for different cores are meant to be independent of each other, then each physical sample should get their own measurement table. On the other hand, if a composite depth is used, then the measurements for each physical sample can be placed in the same table.

Online Lipidifier

The web-based interface can be accessed here. Please not that there are know compatibility issues with Safari.

Instructions

Excel Template

Download the template

Downloading the Excel LiPD Template

Compatible with LiPD version 1.2: File:LiPDv1.2 template.xlsx.

Right-click on the name of the file and select 'Download Linked File'.

Important: Rename the file to be consistent with the DatasetName.

General Instructions

The template has three sheets: Metadata, paleo1measurementTable1, chron1measurementTable1. The sheet named "list" contains ontology information and should not be edited.

There should only be one Metadata sheet/dataset!

If you need additional measurement tables, create new sheets by copying the content from paleo1measurementTable1 to new sheet(s) and name them paleo1measurementTable2, paleo1measurementTable3,... or chron1measurementTable1, chron1measurementTable2,...

Example of a yellow pop-up in the LiPD Excel Template

All the fields in red are mandatory for a LiDP file to be valid. if you're unsure how to answer a question, click on the cell and a yellow pop-up will appear with directions. All the terms used in the Excel template have formal definitions that can be found on this wiki. Use the search bar to access a definition of a term and examples on how the term was used.

Example of a drop-down menu in the LiPD Excel template

Some of the field are drop-down menu options:

You may be required to choose something already on the list (e.g., variableType).
In some instances, you can add your answer if it doesn't have an option (e.g., a new type of proxy observations).

If a dataset only contains inferred variables:

To make the data reusable by the community, we strongly encourage you to enter your raw measurements (Category:MeasuredVariable (L)) along with its interpretation (Category:InferredVariable (L)). However, we are aware that this may not always be possible. For instance, when transforming a legacy dataset into LiPD format, the raw measurements may not be readily available. However, the LinkedEarth wiki (and LiPD) requires a type of archive (e.g. marine sediment). On the wiki, the type of archive is only accessible through a Category:MeasuredVariable (L).

You may wonder why that is. After all, both Category:MeasuredVariable (L) and Category:InferredVariable (L) are a type of Category:Variable (L). However, remember that the LinkedEarth Ontology is designed to describe the relationship among the various categories. A measured variable is measured on the archive while the inferred variable is inferred from a measured variable.

Therefore, one needs to create the measured variable (a dummy one with no values if necessary) on the wiki.

Let's use a practical example. If the dataset you're working with only contains Sea Surface Temperature values and not the associated Sr/Ca data that the temperature inferred from, then create another column filled with the missing value flag for the datasets, using the Sr/Ca header. In the mediate section, only fill out the name, variableType ( measured or inferred), and ProxyObservationType for the variable (in this case, Sr/Ca).

If your table contains more than 14 columns, you can inset the corresponding lines for the metadata. Make sure you copy and paste the formulas from the previous lines! If you have less than 14 variables, clear the content of the cells (In Excel, right click -> Clear contents) but DO NOT delete the rows (i.e. leave them blank). Also clear the unused headers in the table.

Fill in as many fields of the template as possible. Future generations of researchers will thank you!

Step-by-step Instructions

Note: The dataset used for the instructions is a dummy dataset. None of the values were measured.

Remember all the fields in red are mandatory.

Metadata

Example of metadata for a fictional dataset

Scroll-down menu for the archiveType

The Metadata sheet contains the metadata pertaining to the entire dataset.

Dataset Name: The standard notation used on the LinkedEarth wiki is siteName.firstAuthor.year.
ArchiveType: The type of proxy archive on which the measurements were made. This automatically set the Category:ProxyArchive (L) to the proper type.
Original Source_URL: If the data is also stored on NOAA, PANGEA, or with the original publication, enter the URL
Investigators: This corresponds to the contributors on the wiki. Enter the name of anyone who has contributed to the creation of the dataset, including the authors on the publication or lab technicians involved in the study.
Publication Section:
- Authors: The authors of the publication.
- Publication title: The title of the publication.
- Journal: The journal title in which the publication appears.
- Year: The year the publication was published. Can be different from the year in which the dataset was published/created.
- Volume: The volume in the publication.
- Issue: The issue in the publication.
- Pages: The range of pages in the publication.
- Report Number: The number of the report, if applicable.
- DOI: The DOI of the publication.
- Abstract: The abstract of the article.
- Alternate citation in paragraph format: For books, any publication that don't fit well with the above format.
Site Information:
- Northernmost latitude (decimal degree, South negative): The wiki uses a more sophisticated approach for Category:Location (L). Enter the northernmost latitude of your site in the Excel template first, then make appropriate correction directly on the wiki.
- Southernmost latitude (decimal degree, South negative)
- Easternmost longitude (decimal degree, West negative)
- Westernmost longitude (decimal degree, West negative)
- elevation (m), below sea level negative
Funding Agency:
- Funding Agency Name: The name of the funding agency.
- Grant: The grant number.
- Principal Investigator: The principal investigator on the grant.
- country: The nation that funded the study.

Measurement Tables

MeasurementTable tab on the Excel template for conversion to the LiPD format highlighting the metadata and data section.

By default, the Excel template contains sheets to enter a measurement table for the paleo information and one for the chron information. As mentioned in the general instructions, you can add as many measurement tables as necessary.

The step-by-step guide below uses the PaleoData information. The table for the chron information is virtually identical.

The Excel sheet is organized in two sections:

The top portion is reserved for the metadata associated with each variable
The bottom portion contains the data, with appropriate headers.

Data

Example of data entered in the LiPD template. Note the column headers and the missing value flag in this example. Note: Data are not from a real example.

Copy and paste your data starting in column A. The first row corresponds to your column header (variableName). Make the name human-readable and as precise as possible. Don't forget to enter the missing value flag! We recommend using NaN.

Metadata

Example of metadata for the variables in a DataTable: name, variableType ( measured variable or inferred variable), units, ProxyObservationType, InferredVariableType, TakenAtDepth, inferred from, and notes.

Each row corresponds to the metadata associated with each of the column in the data table. If your data table contains more than 14 variables, you can insert lines below <variable14>. Make sure you copy and paste the formulas from the previous lines!

variableName: The name of the variable. It is automatically lifted from the column headers. THESE NEED TO MATCH. Do not use parenthesis for anything besides units. Use the notes instead.
units: If no units (because the quantity is a string or a ratio), write "unitless".
variableType: Use the drop-down menu to select either measured or inferred. This is required information to set the proper page category on the wiki (and therefore associated property).
units: The units in which the variable is expressed.
ProxyObservationType: If the variable is measured, select the type of proxy observation the variable belongs to. The drop-down menu contains the Category:ProxyObservation (L) already in the LinkedEarth Ontology, where you can provide a definition for the new term. If your variable is a new type of observations, enter it in the box. This will automatically create the concept in the LinkedEarth Ontology. Although this property may seem redundant with variableName, think about it from a computer perspective. Let's take the concrete example of a variableName set to G. ruber Mg/Ca. There are actually two pieces information in the name: 1. The ProxyObservationType, which is Mg/Ca in this particular example, and 2. The Category:ProxySensor (L), which is Globigerinoides ruber in this example. A human can make sense of the two pieces of information; this is why we are asking for a variableName in human-readable form. However, the computer needs to place the two pieces of metadata in difference categories.
InferredVariableType: If the variable is inferred, select the type of inferred variable (for instance, Sea Surface Temperature). The drop-down menu contains the various types of inferred variables already in the LinkedEarth Ontology. If your variable is a new type of inferred variable, enter it in the box. This will automatically create the concept in the LinkedEarth Ontology, where you can provide a definition for the new term.
TakenAtDepth: The wiki links each variable with an appropriate depth column using the Property:TakenAtDepth (L). The drop-down menu will automatically populate with the available variable name. Select the most appropriate column for depth information (if any). If multiple depth are reported, select one in the Excel menu. You can add more on the wiki directly.
InferredFrom: This property links the inferred variable to the measured variable from which it has been derived. If the actual values of the measured variable are not provided (for instance, in the case of a legacy dataset), add a dummy column in the DataTable as explained in the general instructions.
notes: notes regarding the specific variable. Notes pertaining to the entire measurement table should be entered on the first row of the Excel sheet.

Example of metadata from the interpretation field for variables in a DataTable: name, variableDetail, rank (i.e., importance) of the interpretation, basis of the interpretation, isLocal, the direction of the interpretation, and the scope of the interpretation.

Interpretation: The Interpretation category allows to describe the phenomena that drove the variable.
- Interpretation1_variable: The name of the Interpretation variable. For instance, the measured variable Mg/Ca is interpreted as Temperature. In the LiPD framework (and by extension LinkedEarth wiki), an inferred variable.
- Interpretation1_variableDetail: Gives detail about the variable. In the Mg/Ca example, the variableDetail is 'sea surface'.
- Interpretation1_rank: If a variable has two (or more) possible interpretations, this property allows to rank them by importance. For instance, the D18O of coral aragonite can be interpreted both in terms of Sea Surface Temperature and sea surface D18O.
- Interpretation1_basis: the DOI of a publication with a relevant quote about the interpretation of the variable.
- Interpretation1_local: Is the interpretation local or far-field? Choose one in the drop-down menu or leave blank.
- Interpretation1_interpDirection: Part of the interpretation metadata that describes whether the interpreted environmental variable increased (positive) or decreases (negative) as the measured variable or inferred variable increases. Pick either positive or negative in the drop-down menu.
- Interpretation1_scope: Part of the interpretation that describes whether the interpretation relates to climate (e.g., Temperature), isotopes (e.g., D18O of precipitation), or ecology. Select one from the drop-down menu or enter a new one.

Note: To add an additional interpretation, copy and paste the headers modifying them to Interpretation2_variable, Interpretation2_variableDetail, Interpretation2_rank, Interpretation2_basis, Interpretation2_local, Interpretation2_interpDirection, Interpretation2_scope... Then copy and paste the formulas for the drop-down menus.

Example of metadata from the calibration field for variables in a DataTable: the calibration equation, notes, calibration reference, and the associated uncertainty.

Calibration: The calibration section allows to enter information regarding how the measured variable is transformed into the inferred variable.
- calibration_equation: The mathematical equation used in the calibration. For instance, if using the Anand et al. (2003) ^[1] general equation, enter Mg/Ca = 0.38exp(0.09T).
- calibration_notes: notes about the calibration equation.
- calibration_reference: The DOI of the publication in which the calibration appears.
- calibration_uncertainty: The value of the uncertainty associated with the calibration.
- calibration_uncertaintyType: The type of uncertainty (e.g., RMSE).

Example of metadata pertaining to the proxy sensor: the genus and the species.

sensorSpecies: For organic proxy sensor such as foraminifera, trees, mollusk, etc..., the species name.
sensorGenus: For organic proxy sensor such as foraminifera, trees, mollusk, etc..., the genus name.

Example of metadata from the physical sample field for variables in a DataTable: name of the physical sample, an alpha-numeric identifier, the IGSN number if available, the location of the sample, the collection method.

Physical Sample
- name: The common name for the physical sample. For instance, "OPD 846".
- identifier: A particular identifier for the sample. For instance, "CAS A" and "CAS D" were used to identify two speleothem samples from the same cave in the Reuter et al. (2009) ^[2]
- hasIGSN: The IGSN number if available
- housedAt: In which location is the sample currently been curated. Can be the name of a laboratory or a central repository. On the wiki, this will linked to a standard page where information about the laboratory or repository can be entered.
- collectionMethod: The method used to collect the sample (e.g., piston core, gravity core,...).

The example referenced above can be found here: File:Excel to LiPD Template TestDataset.xlsx.

Note: None of the metadata and data values in this example come from a real dataset.

Converting to LiPD

As of April 2017, the conversion to a LiPD file needs to be done in Python (a free, open-source computing language). Note that only versions >= 3.5 are supported.

Installing the Python LiPD utilities

In a terminal window, type:

 pip install lipd

For Python 3.6 users, if the pip command fails, use the following:

 pip3 install --egg lipd

For more information about how to use the utilities, visit the GitHub page.

Running the Python LiPD utilities

Open your favorite Python interface (we recommend the use of Spyder, which comes with the Anaconda Python release) and type

#Import the package
import lipd  
#The following command will trigger a GUI to navigate to the Excel file. If you know the path, you can enter it directly in the parenthesis (using quotes)        
lipd.readExcel()    
#Create your LiPD file
D = lipd.excel()

#Validating your file.
#The following command will validate your file to make sure that it's conformed to the LiPD requirements. If the validation step failed, make sure that all the fields in red have been completed.
lipd.validate(D)

You can also use the online validator to validate your LiPD file!.

References

Jump up ↑ P Anand, Elderfield, H., Conte, M.H. (2003) Calibration of Mg/Ca thermometry in planktonic foraminifera from a sediment trap time series. Paleoceanography, 18 (2), 1050, doi:10.1029/2002PA000846
Jump up ↑ Reuter, J,m L. Stott, D. Khider, A. Sinha, H. Cheng, R.L. Edwards (2009). A new perspective on the hydro climate variability in northern South America during the Little Ice Age. Geophysical Research Letters, 36, L21706, doi:10.1029/2009GL041051

Extra information

Credits

Users who have contributed to this Page:

Khider (55 Edits)
Jeg (5 Edits)
Michaelerb (2 Edits)
Chrisheiser (1 Edits)

[anand2003-1] Jump up ↑ P Anand, Elderfield, H., Conte, M.H. (2003) Calibration of Mg/Ca thermometry in planktonic foraminifera from a sediment trap time series. Paleoceanography, 18 (2), 1050, doi:10.1029/2002PA000846

[2] Jump up ↑ Reuter, J,m L. Stott, D. Khider, A. Sinha, H. Cheng, R.L. Edwards (2009). A new perspective on the hydro climate variability in northern South America during the Little Ice Age. Geophysical Research Letters, 36, L21706, doi:10.1029/2009GL041051

[1]

[2]

@@ Line 29: / Line 29: @@
 == How do I get my data into LiPD?==
-As of April 2017, the most efficient way to get you paleoclimate dataset in LiPD format is to fill out our template ([[File:LiPDv1.2_template.xlsx|alt:Excel Template]]) and use the Python [[LiPD Utilities]] to convert the template into a [[Linked Paleo Data | LiPD]] file. Make sure you are using the latest version of the template for compatibility.
+There are two ways to get your data into a LiPD file:
+* Using the "online lipdifier", which is the most straightforward way for simple entries not containing ensemble tables.
-By the end of 2017, a web-based interface should be able to automate a lot of the manual steps.
+* For datasets with ensemble tables, we recommend using the Excel template, adding the appropriate ensemble table sheets ([[File:LiPDv1.2_template.xlsx|alt:Excel Template]]) and use the Python [[LiPD Utilities]] to convert the template into a [[Linked Paleo Data | LiPD]] file. Make sure you are using the latest version of the template for compatibility.
 === General Guidelines ===
@@ Line 71: / Line 71: @@
 A good rule of thumb is to ask: How is the data going to be reused? For instance, if radiocarbon chronologies for different cores are meant to be independent of each other, then each physical sample should get their own [[:Category:MeasurementTable (L) |  measurement table]]. On the other hand, if a composite depth is used, then the measurements for each physical sample can be placed in the same table.
+=== Online Lipidifier ===
+The web-based interface can be accessed [http://lipd.net/playground here]. Please not that there are know compatibility issues with Safari.
+==== Instructions ====
 === Excel Template ===

Difference between revisions of "Creating a LiPD file"

Latest revision as of 20:52, 23 July 2018

Contents

What is LiPD?

How to read a LiPD file?

What can I do with a LiPD file?

How do I get my data into LiPD?

General Guidelines

What goes into a LiPD file?

Study Level

Signal Level

What constitutes a measurement table?

Online Lipidifier

Instructions

Excel Template

Download the template

General Instructions

Step-by-step Instructions

Metadata

Measurement Tables

Data

Metadata

Converting to LiPD

Installing the Python LiPD utilities

Running the Python LiPD utilities

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Getting Started

Community Activities

Current Working Groups

Datasets

Tools