Difference between revisions of "Using LiPD files"

From Linked Earth Wiki
Jump to: navigation, search
( Pages with syntax highlighting errors )
(CSV files: Add tutorial images)
(LiPD Utilities in Python 3)
 
(4 intermediate revisions by the same user not shown)
Line 7: Line 7:
 
The LiPD utilities are compatible with Windows. However, most of the more advanced data analysis tools are only available for Mac or Linux platforms. If you're using Windows and would like to start using Linux, here are two options:
 
The LiPD utilities are compatible with Windows. However, most of the more advanced data analysis tools are only available for Mac or Linux platforms. If you're using Windows and would like to start using Linux, here are two options:
 
# Ask your university for an account on their Linux machine.  Then use an ssh program like [http://www.putty.org/ PuTTY] to connect from your PC.  An X-windows program like [https://sourceforge.net/projects/xming/ Xming] is also needed to produce graphical windows.
 
# Ask your university for an account on their Linux machine.  Then use an ssh program like [http://www.putty.org/ PuTTY] to connect from your PC.  An X-windows program like [https://sourceforge.net/projects/xming/ Xming] is also needed to produce graphical windows.
# If option one isn't a possibility, you could install a Linux virtual machine (e.g. [https://www.virtualbox.org/ VirtualBox]) on your PC.
+
# If option one isn't a possibility, you could install an Ubuntu virtual machine (e.g. [https://www.virtualbox.org/ VirtualBox]) on your PC.
  
 
If you insist on working within Windows, you can install Python on your PC directly from the [https://www.python.org/ Python website].  '''Make sure to select the Python 3 version'''.  
 
If you insist on working within Windows, you can install Python on your PC directly from the [https://www.python.org/ Python website].  '''Make sure to select the Python 3 version'''.  
Line 33: Line 33:
 
  pip install LiPD
 
  pip install LiPD
  
Start python with the command "python".  Then import lipd:
+
If you run into any errors, they may indicate other things you need to install first.  Next, start python with the command "python".  Then import lipd:
  
 
<syntaxhighlight lang="python">
 
<syntaxhighlight lang="python">
Line 42: Line 42:
  
 
<syntaxhighlight lang="python">
 
<syntaxhighlight lang="python">
lipd.readLipd('/path/to/data/file.lpd')  # Load a specific LiPD file.  Or...
+
data_single  = lipd.readLipd('/path/to/data/file.lpd')  # Load a specific LiPD file.  Or...
lipd.readLipd()                          # Load a LiPD file through a GUI.
+
data_all      = lipd.readLipd('/path/to/data/')          # Load all LiPD files in a directory.  Or...
 +
data_selected = lipd.readLipd()                          # Load a LiPD file through a GUI.
 
</syntaxhighlight>
 
</syntaxhighlight>
  
After reading a LiPD file, use the extractTs() function to extract the data and metadata within the file:
+
The data will will be stored in a dictionary, from which you can access all of the data. 
 +
 
 +
Some files contain multiple time series.  To rearrange the data to see all of the time series data, use the extractTs() function:
  
 
<syntaxhighlight lang="python">
 
<syntaxhighlight lang="python">
data_all = lipd.extractTs()
+
data_all_ts = lipd.extractTs(data_all)
 
</syntaxhighlight>
 
</syntaxhighlight>
  
A list of dictionaries is now contained in the variable "data_all".  (If you're unfamiliar with python, read a primer to get a better understanding.) Each dictionary contains data and metadata fields. From here, it's easy to start using the data:
+
An array of dictionaries is now contained in the variable "data_all_ts".  (If you're unfamiliar with python, read a primer to get a better understanding.)  From here, the data can be sorted with the filterTs() function, according to metadata fields.  For example:
  
 
<syntaxhighlight lang="python">
 
<syntaxhighlight lang="python">
data = data_all[0]  # Save the first time series object to a new variable. Replace "0" with a different
+
data_NH = lipd.filterTs(data_all_ts,'geo_meanLat > 0')
                    # number to select a different time series object if there are more than one.  
+
</syntaxhighlight>
 +
 
 +
Different data sets may not have all of the same fields, so use the ".keys()" command to check what fields are in your data set.
 +
 
 +
After using the extractTs command above, you can access data and metadata fields.  From here, it's easy to start using the data:
 +
 
 +
<syntaxhighlight lang="python">
 +
data = data_all_ts[0]  # Save the first time series object to a new variable. Replace "0" with a different
 +
                      # number to select a different time series object if there are more than one.  
 
print(data.keys())                # Print the names of all data and metadata fields.  
 
print(data.keys())                # Print the names of all data and metadata fields.  
 
print(data['dataSetName'])        # Print the contents of one field: the name of the data set.
 
print(data['dataSetName'])        # Print the contents of one field: the name of the data set.
Line 63: Line 74:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
Different data sets may not have all of the same fields, so use the ".keys()" command to check.  Now, let's make a simple figure:
+
Now, let's make a simple figure:
  
 
<syntaxhighlight lang="python">
 
<syntaxhighlight lang="python">
Line 75: Line 86:
 
</syntaxhighlight>
 
</syntaxhighlight>
  
From here, explore a LiPD file some more on your own.  There are other commands in the LiPD utilities, but the ones above are enough to access the data on a basic level.  If you'd like to use pre-built functions to explore the paleo data, see the Pyleoclim section farther down this page.
+
From here, explore a LiPD file some more on your own.  There are other commands in the LiPD utilities, but the ones above are enough to access the data on a basic level.  To get started with some more commands, read the quickstart guide on Github: [https://github.com/nickmckay/LiPD-utilities/blob/master/Examples/Welcome%20LiPD%20-%20Quickstart.ipynb Quickstart Guide].
 +
 
 +
If you'd like to use pre-built functions to explore the paleo data, see the Pyleoclim section farther down this page.
  
 
==Pyleoclim==
 
==Pyleoclim==

Latest revision as of 18:45, 16 May 2019

After downloading a LiPD file, there are a number of ways to use it. The recommended ways are to use the LiPD utilities or pyleoclim.

A Note for Windows Users

The LiPD utilities are compatible with Windows. However, most of the more advanced data analysis tools are only available for Mac or Linux platforms. If you're using Windows and would like to start using Linux, here are two options:

  1. Ask your university for an account on their Linux machine. Then use an ssh program like PuTTY to connect from your PC. An X-windows program like Xming is also needed to produce graphical windows.
  2. If option one isn't a possibility, you could install an Ubuntu virtual machine (e.g. VirtualBox) on your PC.

If you insist on working within Windows, you can install Python on your PC directly from the Python website. Make sure to select the Python 3 version.

After installing python, open a Windows command prompt. Despite cosmetic similarities, a Windows command prompt is different from a Linux terminal and generally uses different commands. To launch Python from here, type "python". To install python packages in Windows, exit python and type:

python -m pip install package_name

where "package_name" is the name of a python package, such as LiPD. However, not all packages may be available for Windows, so Linux is still recommended.

Another option is to use the Anaconda Python release.

LiPD Utilities

The LiPD Utilities are a primary way to interact with LiPD files. The utilities are available on GitHub in Matlab, R, and Python language. All three languages support reading and writing a LiPD file, extracting and collapsing time series, and filtering time series.

LiPD Utilities in Python 3

To use LiPD Utilies in Python, first make sure you have Python 3 installed. If you don't, one option is Anaconda.

Next install LiPD Utilities:

pip install LiPD

If you run into any errors, they may indicate other things you need to install first. Next, start python with the command "python". Then import lipd:

import lipd

There are many functions in the LiPD utilities, but here are just a few. First, read a LiPD file:

data_single   = lipd.readLipd('/path/to/data/file.lpd')  # Load a specific LiPD file.  Or...
data_all      = lipd.readLipd('/path/to/data/')          # Load all LiPD files in a directory.  Or...
data_selected = lipd.readLipd()                          # Load a LiPD file through a GUI.

The data will will be stored in a dictionary, from which you can access all of the data.

Some files contain multiple time series. To rearrange the data to see all of the time series data, use the extractTs() function:

data_all_ts = lipd.extractTs(data_all)

An array of dictionaries is now contained in the variable "data_all_ts". (If you're unfamiliar with python, read a primer to get a better understanding.) From here, the data can be sorted with the filterTs() function, according to metadata fields. For example:

data_NH = lipd.filterTs(data_all_ts,'geo_meanLat > 0')

Different data sets may not have all of the same fields, so use the ".keys()" command to check what fields are in your data set.

After using the extractTs command above, you can access data and metadata fields. From here, it's easy to start using the data:

data = data_all_ts[0]  # Save the first time series object to a new variable. Replace "0" with a different
                       # number to select a different time series object if there are more than one. 
print(data.keys())                 # Print the names of all data and metadata fields. 
print(data['dataSetName'])         # Print the contents of one field: the name of the data set.
year = data['year']                # Save the time values to a new variable.
values = data['paleoData_values']  # Save the data values to a new variable.

Now, let's make a simple figure:

# Make a simple figure with matplotlib.
import matplotlib.pyplot as plt
plt.plot(year,values)
plt.title("Name: "+data['dataSetName']+", archive: "+data['archiveType'])
plt.xlabel(data['yearUnits'])
plt.ylabel(data['paleoData_variableName'])
plt.show()

From here, explore a LiPD file some more on your own. There are other commands in the LiPD utilities, but the ones above are enough to access the data on a basic level. To get started with some more commands, read the quickstart guide on Github: Quickstart Guide.

If you'd like to use pre-built functions to explore the paleo data, see the Pyleoclim section farther down this page.

Pyleoclim

Pyleoclim is another primary way of interacting with LiPD files. While the LiPD Utilities offer a more manual approach to analysis, Pyleoclim has a variety of pre-built functions which can save time and effort. Install pyleoclim with:

pip install pyleoclim

To get started with pyleoclim, see the Pyleoclim wiki page. In particular, read the quickstart guide: Quickstart Guide.

Please note that some of the dependencies used by Pyleoclim are unavailable to Windows users.

CSV files

Example of an unzipped LiPD files. The data is contained in the csv files while the metadata is stored in the JSON-LD file

If you need a plaintext version of the data, all LiPD files contain .csv files of the raw data. Simply unzip your LiPD file to find a .csv file. However, a central goal of LiPD is to put paleoclimate data into a standardized format which common analysis scripts can be built for, so using .csv files more than necessary is not recommended. Also, re-zipping an unzipped LiPD file will not turn it into a valid LiPD file.

Downloading a csv file

You can also access the .csv file on the wiki and download them.