Using LiPD files

From Linked Earth Wiki
Revision as of 06:22, 24 June 2017 by Michaelerb (Talk | contribs) (Minor edit.)

Jump to: navigation, search
( Pages with syntax highlighting errors )

After downloading a LiPD file, there are a number of ways to use it. The recommended ways are to use the LiPD utilities or pyleoclim.

LiPD Utilities

The LiPD Utilities are a primary way to interact with LiPD files. The utilities are available on GitHub in Matlab, R, and Python language. All three languages support reading and writing a LiPD file, extracting and collapsing time series, and filtering time series.

LiPD Utilities in Python 3

A note for Windows users: While programs like Python exist for Windows, you'll have more flexibility if you work in a Linux environment. To start working with Linux, you could ask your university for an account on their Linux machine, then use a program like ssh to connect. Alternately, you could install a Linux virtual machine (e.g. VirtualBox) on your PC.

To use LiPD Utilies in Python, first make sure you have Python 3 installed. If you don't, one option is Anaconda.

Next install LiPD Utilities:

pip install LiPD

Start python with the command "python". Then import lipd:

import lipd

There are many functions in the LiPD utilities, but here are just a few. First, read a LiPD file:

lipd.readLipd('/path/to/data/file.lpd')  # Load a specific LiPD file.  Or...
lipd.readLipd()                          # Load a LiPD file through a GUI.

After reading a LiPD file, use the extractTs() function to extract the data and metadata within the file:

data_all = lipd.extractTs()

A list of dictionaries is now contained in the variable "data_all". (If you're unfamiliar with python, read a primer to get a better understanding.) Each dictionary contains data and metadata fields. From here, it's easy to start using the data:

data = data_all[0]  # Save the first time series object to a new variable. Replace "0" with a different
                    # number to select a different time series object if there are more than one. 
print(data.keys())                 # Print the names of all data and metadata fields. 
print(data['dataSetName'])         # Print the contents of one field: the name of the data set.
year = data['year']                # Save the time values to a new variable.
values = data['paleoData_values']  # Save the data values to a new variable.

Different data sets may not have all of the same fields, so use the ".keys()" command to check. Now, let's make a simple figure:

# Make a simple figure with matplotlib.
import matplotlib.pyplot as plt
plt.plot(year,values)
plt.title("Name: "+data['dataSetName']+", archive: "+data['archiveType'])
plt.xlabel(data['yearUnits'])
plt.ylabel(data['paleoData_variableName'])
plt.show()

From here, explore a LiPD file some more on your own. There are other commands in the LiPD utilities, but the ones above are enough to access the data on a basic level. If you'd like to use pre-built functions to explore the paleo data, see the Pyleoclim section farther down this page.

Pyleoclim

CSV files

If you’re in a jam and need a plaintext version of the data, all LiPD files contain .csv files of the raw data. Simply unzip your LiPD file to find a .csv file. However, a central goal of LiPD is to put paleoclimate data into a standardized format which common analysis scripts can be built for, so using .csv files more than necessary is not recommended.