Using LiPD files

From Linked Earth Wiki
Revision as of 15:59, 26 June 2017 by Michaelerb (Talk | contribs) (Added some additional information for Windows users.)

Jump to: navigation, search

After downloading a LiPD file, there are a number of ways to use it. The recommended ways are to use the LiPD utilities or pyleoclim.

A Note for Windows Users

A Linux environment is recommended for data analysis. Mac or Linux users should already have access to Linux, so they can skip the rest of this section. If you're using Windows and would like to start using Linux, here are two options:

  1. Ask your university for an account on their Linux machine. Then use an ssh program like PuTTY to connect from your PC. An X-windows program like Xming is also needed to produce graphical windows.
  2. If option one isn't a possibility, you could install a Linux virtual machine (e.g. VirtualBox) on your PC.

If you insist on working within Windows, you can install Python on your PC. A recommended python distribution is Anaconda. Make sure to select the Python 3 version.

After installing python, open a Windows command prompt. Despite cosmetic similarities, a Windows command prompt is different from a Linux terminal and generally uses different commands. To launch Python from here, type "python". To install python packages in Windows, exit python and type:

python -m pip install package_name

where "package_name" is the name of a python package, such as LiPD. However, not all packages may be available for Windows, so Linux is still recommended.

LiPD Utilities

The LiPD Utilities are a primary way to interact with LiPD files. The utilities are available on GitHub in Matlab, R, and Python language. All three languages support reading and writing a LiPD file, extracting and collapsing time series, and filtering time series.

LiPD Utilities in Python 3

To use LiPD Utilies in Python, first make sure you have Python 3 installed. If you don't, one option is Anaconda.

Next install LiPD Utilities:

pip install LiPD

Start python with the command "python". Then import lipd:

import lipd

There are many functions in the LiPD utilities, but here are just a few. First, read a LiPD file:

lipd.readLipd('/path/to/data/file.lpd')  # Load a specific LiPD file.  Or...
lipd.readLipd()                          # Load a LiPD file through a GUI.

After reading a LiPD file, use the extractTs() function to extract the data and metadata within the file:

data_all = lipd.extractTs()

A list of dictionaries is now contained in the variable "data_all". (If you're unfamiliar with python, read a primer to get a better understanding.) Each dictionary contains data and metadata fields. From here, it's easy to start using the data:

data = data_all[0]  # Save the first time series object to a new variable. Replace "0" with a different
                    # number to select a different time series object if there are more than one. 
print(data.keys())                 # Print the names of all data and metadata fields. 
print(data['dataSetName'])         # Print the contents of one field: the name of the data set.
year = data['year']                # Save the time values to a new variable.
values = data['paleoData_values']  # Save the data values to a new variable.

Different data sets may not have all of the same fields, so use the ".keys()" command to check. Now, let's make a simple figure:

# Make a simple figure with matplotlib.
import matplotlib.pyplot as plt
plt.plot(year,values)
plt.title("Name: "+data['dataSetName']+", archive: "+data['archiveType'])
plt.xlabel(data['yearUnits'])
plt.ylabel(data['paleoData_variableName'])
plt.show()

From here, explore a LiPD file some more on your own. There are other commands in the LiPD utilities, but the ones above are enough to access the data on a basic level. If you'd like to use pre-built functions to explore the paleo data, see the Pyleoclim section farther down this page.

Pyleoclim

CSV files

If you’re in a jam and need a plaintext version of the data, all LiPD files contain .csv files of the raw data. Simply unzip your LiPD file to find a .csv file. However, a central goal of LiPD is to put paleoclimate data into a standardized format which common analysis scripts can be built for, so using .csv files more than necessary is not recommended.