Using LiPD files

From Linked Earth Wiki
Revision as of 23:19, 23 June 2017 by Michaelerb (Talk | contribs) (LiPD Utilities)

Jump to: navigation, search
Note: This page is a work in progress (moreso than most wiki pages).

After downloading a LiPD file, there are a number of ways to use it. The recommended ways are to use the LiPD utilities or pyleoclim.

LiPD Utilities

The LiPD Utilities are a primary way to interact with LiPD files. The utilities are available on GitHub in Matlab, R, and Python language. All three languages support reading and writing a LiPD file, extracting and collapsing time series, and filtering time series.

LiPD Utilities in Python 3

A note for Windows users: While programs like Python exist for Windows, you'll have more flexibility if you work in a Linux environment. To start working with Linux, you could ask your university for an account on their Linux machine, then use a program like ssh to connect. Alternately, you could install a Linux virtual machine (e.g. VirtualBox) on your PC.

To use LiPD Utilies in Python, first make sure you have Python 3 installed. If you don't, one option is Anaconda.

Next install LiPD Utilities:

pip install LiPD

Start python with the command "python". Then import lipd:

import lipd

There are many functions in the LiPD utilities, but here are just a few. First, read a LiPD file:

lipd.readLipd('/path/to/data/file.lpd')  # Load a specific LiPD file.  Or...
lipd.readLipd()                          # Load a LiPD file through a GUI.

After reading a LiPD file, use the extractTs() function to extract the data and metadata within the file:

data_all = lipd.extractTs()

A list of dictionaries is now contained in the variable "data_all". (If you're unfamiliar with python, read a primer to get a better understanding.) Each dictionary contains data and metadata fields. From here, it's easy to start using the data. Different data sets may not have all of the same fields, so use the ".keys()" command to check. Let's save some data to different variables and make a simple figure:

data = data_all[0]  # Save the first time series object to a new variable. Replace "0" with a different
                    # number to select a different time series object if there are more than one. 
print(data.keys())                 # Print the names of all data and metadata fields. 
print(data['dataSetName'])         # Print the contents of one field: the name of the data set.
year = data['year']                # Save the time values to a new variable.
values = data['paleoData_values']  # Save the data values to a new variable.

# Make a simple figure with matplotlib.
import matplotlib.pyplot as plt
plt.plot(year,values)
plt.title("Name: "+data['dataSetName']+", archive: "+data['archiveType'])
plt.xlabel(data['yearUnits'])
plt.ylabel(data['paleoData_variableName'])
plt.show()

From here, explore a LiPD file some more on your own. There are other commands in the LiPD utilities, but the ones above are enough to access the data on a basic level. If you'd like to use pre-built functions to explore the paleo data, see the Pyleoclim section farther down this page.

Pyleoclim

CSV files

If you’re in a jam and need a plaintext version of the data, all LiPD files contain .csv files of the raw data. Simply unzip your LiPD file to find a .csv file. However, a central goal of LiPD is to put paleoclimate data into a standardized format which common analysis scripts can be built for, so using .csv files more than necessary is not recommended.