Difference between revisions of "Querying the Datasets"

From Linked Earth Wiki
Jump to: navigation, search
( Pages with syntax highlighting errors )
(Querying Linked Earth Data from another Program/Script)
(Add link to Notebook)
 
(16 intermediate revisions by 2 users not shown)
Line 6: Line 6:
 
==== Example: Get a List of Datasets (limit to 5) ====
 
==== Example: Get a List of Datasets (limit to 5) ====
 
<pre>
 
<pre>
{{ #ask: [[Category:Dataset_©]]
+
{{ #ask: [[Category:Dataset_(L)]]
 
  | mainlabel=Datasets
 
  | mainlabel=Datasets
 
  | format=broadtable
 
  | format=broadtable
Line 12: Line 12:
 
}}
 
}}
 
</pre>
 
</pre>
{{ #ask: [[Category:Dataset_©]]
+
{{ #ask: [[Category:Dataset_(L)]]
 
  | mainlabel=Datasets
 
  | mainlabel=Datasets
 
  | format=broadtable
 
  | format=broadtable
Line 21: Line 21:
  
 
<pre>
 
<pre>
{{ #ask: [[Category:Dataset_©]] [[IncludesPaleoData_©.BasedOn_©.Name_©::d18O]]
+
{{ #ask: [[Category:Dataset_(L)]] [[IncludesPaleoData_(L).FoundInMeasurementTable_(L).IncludesVariable_(L).ProxyObservationType_(L)::D18O]]
  | ?IncludesPaleoData_©=PaleoData
+
  | ?IncludesPaleoData_(L)=PaleoData
 
  | format=broadtable
 
  | format=broadtable
 
  | limit=5
 
  | limit=5
 
}}
 
}}
 
</pre>
 
</pre>
{{ #ask: [[Category:Dataset_©]] [[IncludesPaleoData_©.BasedOn_©.Name_©::d18O]]
+
{{ #ask: [[Category:Dataset_(L)]] [[IncludesPaleoData_(L).FoundInMeasurementTable_(L).IncludesVariable_(L).ProxyObservationType_(L)::D18O]]
  | ?IncludesPaleoData_©=PaleoData
+
  | ?IncludesPaleoData_(L)=PaleoData
 
  | format=broadtable
 
  | format=broadtable
 
  | limit=5
 
  | limit=5
Line 35: Line 35:
 
==== Example: Get a List of Datasets with archive type "Sclerosponge" and plot them on a map ====
 
==== Example: Get a List of Datasets with archive type "Sclerosponge" and plot them on a map ====
 
<pre>
 
<pre>
{{#ask: [[Category:Location_©]] [[CoordinatesFor.archiveType::Sclerosponge]]
+
{{#ask:  
 +
[[Category:Location_(L)]]  
 +
[[CoordinatesFor.IncludesPaleoData_(L).FoundInMeasurementTable_(L).IncludesVariable_(L).HasProxySystem_(L).ProxyArchiveType_(L)::Sclerosponge]]
 
  | ?Coordinates
 
  | ?Coordinates
 
  | ?CoordinatesFor
 
  | ?CoordinatesFor
  | ?Name_©
+
  | ?Name_(L)
 
  | showtitle=off
 
  | showtitle=off
 
  | maxzoom=14
 
  | maxzoom=14
Line 47: Line 49:
 
}}
 
}}
 
</pre>
 
</pre>
{{#ask: [[Category:Location_©]] [[CoordinatesFor.archiveType::Sclerosponge]]
+
{{#ask:  
 +
[[Category:Location_(L)]]  
 +
[[CoordinatesFor.IncludesPaleoData_(L).FoundInMeasurementTable_(L).IncludesVariable_(L).HasProxySystem_(L).ProxyArchiveType_(L)::Sclerosponge]]
 
  | ?Coordinates
 
  | ?Coordinates
 
  | ?CoordinatesFor
 
  | ?CoordinatesFor
  | ?Name_©
+
  | ?Name_(L)
 
  | showtitle=off
 
  | showtitle=off
 
  | maxzoom=14
 
  | maxzoom=14
Line 62: Line 66:
 
One can also make more complex queries using SPARQL to the wiki's triple store.  The SPARQL endpoint is http://wiki.linked.earth/store/ds/query, and one can make queries by passing a query parameter with the text of the SPARQL query. The data can be returned in a variety of formats. The SPARQL queries refer to the Linked Earth core ontology terms found at http://linked.earth/ontology.  
 
One can also make more complex queries using SPARQL to the wiki's triple store.  The SPARQL endpoint is http://wiki.linked.earth/store/ds/query, and one can make queries by passing a query parameter with the text of the SPARQL query. The data can be returned in a variety of formats. The SPARQL queries refer to the Linked Earth core ontology terms found at http://linked.earth/ontology.  
  
''Note: The mapping between terms on the wiki and the ontology can be found at any wiki Property or Category page by looking at the "Imported from" value. For example, the property [[Property:ArchivedIn_©]] imports the term core:archivedIn from the ontology, where "core:" prefix refers to the linked earth ontology at http://linked.earth/ontology''
+
''Note: The mapping between terms on the wiki and the ontology can be found at any wiki Property or Category page by looking at the "Imported from" value. For example, the property [[Property:ArchivedIn_(L)]] imports the term core:archivedIn from the ontology, where "core:" prefix refers to the linked earth ontology at http://linked.earth/ontology''
  
 
====  Get Datasets that have paleo data based on d18O Proxy ( along with file names of the csv files ) ====
 
====  Get Datasets that have paleo data based on d18O Proxy ( along with file names of the csv files ) ====
<pre>
+
<syntaxhighlight lang="sparql">
 
PREFIX core: <http://linked.earth/ontology#>
 
PREFIX core: <http://linked.earth/ontology#>
 +
PREFIX wiki: <http://wiki.linked.earth/Special:URIResolver/>
 
SELECT ?s ?pd ?table ?file
 
SELECT ?s ?pd ?table ?file
 
WHERE {
 
WHERE {
 
   ?s a core:Dataset .
 
   ?s a core:Dataset .
 
   ?s core:includesPaleoData ?pd .
 
   ?s core:includesPaleoData ?pd .
  ?pd core:basedOn ?proxy .
 
  ?proxy core:name "d18O" .
 
 
   ?pd core:foundInMeasurementTable ?table .
 
   ?pd core:foundInMeasurementTable ?table .
 +
  ?table core:includesVariable ?var .
 +
  ?var core:proxyObservationType wiki:D18O .
 
   ?table core:hasFileName ?file
 
   ?table core:hasFileName ?file
 
}
 
}
</pre>
+
</syntaxhighlight>
  
 
The following is a live URL that queries the SPARQL endpoint with the above query:
 
The following is a live URL that queries the SPARQL endpoint with the above query:
  
http://wiki.linked.earth/store/ds/query?query=PREFIX+core%3A+%3Chttp%3A%2F%2Flinked.earth%2Fontology%23%3E%0ASELECT+%3Fs+%3Fpd+%3Ftable+%3Ffile%0AWHERE+%7B%0A++%3Fs+a+core%3ADataset+.%0A++%3Fs+core%3AincludesPaleoData+%3Fpd+.%0A++%3Fpd+core%3AbasedOn+%3Fproxy+.%0A++%3Fproxy+core%3Aname+%22d18O%22+.%0A++%3Fpd+core%3AfoundInMeasurementTable+%3Ftable+.%0A++%3Ftable+core%3AhasFileName+%3Ffile%0A%7D
+
http://wiki.linked.earth/store/ds/query?query=PREFIX+core%3A+%3Chttp%3A%2F%2Flinked.earth%2Fontology%23%3E%0APREFIX+wiki%3A+%3Chttp%3A%2F%2Fwiki.linked.earth%2FSpecial%3AURIResolver%2F%3E%0ASELECT+%3Fs+%3Fpd+%3Ftable+%3Ffile%0AWHERE+%7B%0A++%3Fs+a+core%3ADataset+.%0A++%3Fs+core%3AincludesPaleoData+%3Fpd+.%0A++%3Fpd+core%3AfoundInMeasurementTable+%3Ftable+.%0A++%3Ftable+core%3AincludesVariable+%3Fvar+.%0A++%3Fvar+core%3AproxyObservationType+wiki%3AD18O+.%0A++%3Ftable+core%3AhasFileName+%3Ffile%0A%7D
  
====  Get Datasets that have inferred variables with an uncertainty less than 0.5  ====
+
====  Get Datasets that have inferred variables with its calibration uncertainty less than 0.5  ====
<pre>
+
<syntaxhighlight lang="sparql">
 
PREFIX core: <http://linked.earth/ontology#>
 
PREFIX core: <http://linked.earth/ontology#>
 
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
 
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
Line 89: Line 94:
 
WHERE {
 
WHERE {
 
   ?v a core:InferredVariable .
 
   ?v a core:InferredVariable .
   ?v core:calibratedWith ?calibration .
+
   ?v core:calibratedVia ?calibration .
 
   ?calibration core:hasUncertainty ?unc .
 
   ?calibration core:hasUncertainty ?unc .
 
   ?unc core:hasValue ?val .
 
   ?unc core:hasValue ?val .
Line 98: Line 103:
 
   ?ds core:includesPaleoData ?data
 
   ?ds core:includesPaleoData ?data
 
}
 
}
</pre>
+
</syntaxhighlight>
  
 
The following is a live URL that queries the SPARQL endpoint with the above query:
 
The following is a live URL that queries the SPARQL endpoint with the above query:
  
http://wiki.linked.earth/store/ds/query?query=PREFIX+core%3A+%3Chttp%3A%2F%2Flinked.earth%2Fontology%23%3E%0APREFIX+xsd%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0ASELECT+%3Fds+%3Fdata+%3Fv+%3Fval%0AWHERE+%7B%0A++%3Fv+a+core%3AInferredVariable+.%0A++%3Fv+core%3AcalibratedWith+%3Fcalibration+.%0A++%3Fcalibration+core%3AhasUncertainty+%3Func+.%0A++%3Func+core%3AhasValue+%3Fval+.%0A++FILTER+(xsd%3Adouble(%3Fval)+%3C+0.5)+.%0A++%3Fds+a+core%3ADataset+.%0A++%3Fv+core%3AfoundInTable+%3Ftable+.%0A++%3Fdata+core%3AfoundInMeasurementTable+%3Ftable+.%0A++%3Fds+core%3AincludesPaleoData+%3Fdata%0A%7D
+
http://wiki.linked.earth/store/ds/query?query=PREFIX+core%3A+%3Chttp%3A%2F%2Flinked.earth%2Fontology%23%3E%0APREFIX+xsd%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0ASELECT+%3Fds+%3Fdata+%3Fv+%3Fval%0AWHERE+%7B%0A++%3Fv+a+core%3AInferredVariable+.%0A++%3Fv+core%3AcalibratedVia+%3Fcalibration+.%0A++%3Fcalibration+core%3AhasUncertainty+%3Func+.%0A++%3Func+core%3AhasValue+%3Fval+.%0A++FILTER+(xsd%3Adouble(%3Fval)+%3C+0.5)+.%0A++%3Fds+a+core%3ADataset+.%0A++%3Fv+core%3AfoundInTable+%3Ftable+.%0A++%3Fdata+core%3AfoundInMeasurementTable+%3Ftable+.%0A++%3Fds+core%3AincludesPaleoData+%3Fdata%0A%7D
  
 
=== Querying Linked Earth Data from another Program/Script ===
 
=== Querying Linked Earth Data from another Program/Script ===
Line 139: Line 144:
  
 
datasetname = sys.argv[1]
 
datasetname = sys.argv[1]
 +
print datasetname
  
 
endpoint = "http://wiki.linked.earth/store/ds/query" # Query Metadata
 
endpoint = "http://wiki.linked.earth/store/ds/query" # Query Metadata
Line 147: Line 153:
 
SELECT ?data ?filename
 
SELECT ?data ?filename
 
WHERE {
 
WHERE {
   wiki:""" + datasetname + """ core:includesPaleoData ?data .
+
   ?ds core:includesPaleoData ?data .
 +
  ?ds core:name '""" + datasetname + """' .
 
   ?data core:foundInMeasurementTable ?table .
 
   ?data core:foundInMeasurementTable ?table .
 
   ?table core:hasFileName ?filename
 
   ?table core:hasFileName ?filename
Line 171: Line 178:
 
       filelib.retrieve(fileurl, os.path.basename(fileurl))
 
       filelib.retrieve(fileurl, os.path.basename(fileurl))
 
</syntaxhighlight>
 
</syntaxhighlight>
 +
 +
== Using  Jupyter Notebook==
 +
 +
The LinkedEarth team put together a Jupyter Notebook that allows fairly complex queries with no knowledge of coding required. All you need is Python v3+ and Jupyter Notebook installed on your computer. You can download the Notebook [https://github.com/LinkedEarth/Queries here].
 +
 +
== See Also ==
 +
* [[Coral Query]]: Learn how to search the database by archive type

Latest revision as of 17:30, 13 September 2017

Semantic Media Wiki Queries

The data in the wiki can be queried and embedded within the wiki by Semantic Media Wiki (SMW) queries. The queries refer to the Special:Categories and Special:Properties defined in the wiki.

Example: Get a List of Datasets (limit to 5)

{{ #ask: [[Category:Dataset_(L)]]
 | mainlabel=Datasets
 | format=broadtable
 | limit=5
}}
Datasets
A7.Oppo.2005
Afr-ColdAirCave.Sundqvist.2013
Afr-LakeMalawi.Powers.2011
Afr-LakeTanganyika.Tierney.2010
Afr-P178-15P.Tierney.2015
... further results

Example: Get a List of Datasets that have paleo data based on d18O Proxy (limit 5)

{{ #ask: [[Category:Dataset_(L)]] [[IncludesPaleoData_(L).FoundInMeasurementTable_(L).IncludesVariable_(L).ProxyObservationType_(L)::D18O]]
 | ?IncludesPaleoData_(L)=PaleoData
 | format=broadtable
 | limit=5
}}
 PaleoData
A7.Oppo.2005A7.Oppo.2005.PaleoData1
Afr-ColdAirCave.Sundqvist.2013Afr-ColdAirCave.Sundqvist.2013.PaleoData1
Ant-BerknerIsland.Mulvaney.2002Ant-BerknerIsland.Mulvaney.2002.PaleoData1
Ant-CoastalDML.Thamban.2012Ant-CoastalDML.Thamban.2012.PaleoData1
Ant-DSS.Moy.2012Ant-DSS.Moy.2012.PaleoData1
... further results

Example: Get a List of Datasets with archive type "Sclerosponge" and plot them on a map

{{#ask: 
[[Category:Location_(L)]] 
[[CoordinatesFor.IncludesPaleoData_(L).FoundInMeasurementTable_(L).IncludesVariable_(L).HasProxySystem_(L).ProxyArchiveType_(L)::Sclerosponge]]
 | ?Coordinates
 | ?CoordinatesFor
 | ?Name_(L)
 | showtitle=off
 | maxzoom=14
 | minzoom=1
 | limit=500
 | template=LiPDLocation
 | format=leaflet
}}
Loading map...

SPARQL Queries

One can also make more complex queries using SPARQL to the wiki's triple store. The SPARQL endpoint is http://wiki.linked.earth/store/ds/query, and one can make queries by passing a query parameter with the text of the SPARQL query. The data can be returned in a variety of formats. The SPARQL queries refer to the Linked Earth core ontology terms found at http://linked.earth/ontology.

Note: The mapping between terms on the wiki and the ontology can be found at any wiki Property or Category page by looking at the "Imported from" value. For example, the property Property:ArchivedIn_(L) imports the term core:archivedIn from the ontology, where "core:" prefix refers to the linked earth ontology at http://linked.earth/ontology

Get Datasets that have paleo data based on d18O Proxy ( along with file names of the csv files )

PREFIX core: <http://linked.earth/ontology#>
PREFIX wiki: <http://wiki.linked.earth/Special:URIResolver/>
SELECT ?s ?pd ?table ?file
WHERE {
  ?s a core:Dataset .
  ?s core:includesPaleoData ?pd .
  ?pd core:foundInMeasurementTable ?table .
  ?table core:includesVariable ?var .
  ?var core:proxyObservationType wiki:D18O .
  ?table core:hasFileName ?file
}

The following is a live URL that queries the SPARQL endpoint with the above query:

http://wiki.linked.earth/store/ds/query?query=PREFIX+core%3A+%3Chttp%3A%2F%2Flinked.earth%2Fontology%23%3E%0APREFIX+wiki%3A+%3Chttp%3A%2F%2Fwiki.linked.earth%2FSpecial%3AURIResolver%2F%3E%0ASELECT+%3Fs+%3Fpd+%3Ftable+%3Ffile%0AWHERE+%7B%0A++%3Fs+a+core%3ADataset+.%0A++%3Fs+core%3AincludesPaleoData+%3Fpd+.%0A++%3Fpd+core%3AfoundInMeasurementTable+%3Ftable+.%0A++%3Ftable+core%3AincludesVariable+%3Fvar+.%0A++%3Fvar+core%3AproxyObservationType+wiki%3AD18O+.%0A++%3Ftable+core%3AhasFileName+%3Ffile%0A%7D

Get Datasets that have inferred variables with its calibration uncertainty less than 0.5

PREFIX core: <http://linked.earth/ontology#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?ds ?data ?v ?val
WHERE {
  ?v a core:InferredVariable .
  ?v core:calibratedVia ?calibration .
  ?calibration core:hasUncertainty ?unc .
  ?unc core:hasValue ?val .
  FILTER (xsd:double(?val) < 0.5) .
  ?ds a core:Dataset .
  ?v core:foundInTable ?table .
  ?data core:foundInMeasurementTable ?table .
  ?ds core:includesPaleoData ?data
}

The following is a live URL that queries the SPARQL endpoint with the above query:

http://wiki.linked.earth/store/ds/query?query=PREFIX+core%3A+%3Chttp%3A%2F%2Flinked.earth%2Fontology%23%3E%0APREFIX+xsd%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0ASELECT+%3Fds+%3Fdata+%3Fv+%3Fval%0AWHERE+%7B%0A++%3Fv+a+core%3AInferredVariable+.%0A++%3Fv+core%3AcalibratedVia+%3Fcalibration+.%0A++%3Fcalibration+core%3AhasUncertainty+%3Func+.%0A++%3Func+core%3AhasValue+%3Fval+.%0A++FILTER+(xsd%3Adouble(%3Fval)+%3C+0.5)+.%0A++%3Fds+a+core%3ADataset+.%0A++%3Fv+core%3AfoundInTable+%3Ftable+.%0A++%3Fdata+core%3AfoundInMeasurementTable+%3Ftable+.%0A++%3Fds+core%3AincludesPaleoData+%3Fdata%0A%7D

Querying Linked Earth Data from another Program/Script

Since the endpoint at http://wiki.linked.earth/store/ds/query also allows one to make queries remotely, one can make queries programmatically from their programs and scripts using whichever language they are comfortable in.

Fetch all dataset names from the wiki

import json
import requests
​
url = "http://wiki.linked.earth/store/ds/query"
​
query = """PREFIX core: <http://linked.earth/ontology#>
PREFIX wiki: <http://wiki.linked.earth/Special:URIResolver/>
SELECT ?ds ?name
WHERE {
  ?ds a core:Dataset .
  ?ds core:name ?name
}"""
​
response = requests.post(url, data = {'query': query})
res = json.loads(response.text)
​
for item in res['results']['bindings']:
	print (item['name']['value'])

Fetch all dataset csv files given a dataset name

import os
import sys
import json
import requests
import urllib

datasetname = sys.argv[1]
print datasetname

endpoint = "http://wiki.linked.earth/store/ds/query" # Query Metadata
wikiapi = "http://wiki.linked.earth/wiki/api.php" # Fetch file data

query = """PREFIX core: <http://linked.earth/ontology#>
PREFIX wiki: <http://wiki.linked.earth/Special:URIResolver/>
SELECT ?data ?filename
WHERE {
  ?ds core:includesPaleoData ?data .
  ?ds core:name '""" + datasetname + """' .
  ?data core:foundInMeasurementTable ?table .
  ?table core:hasFileName ?filename
}"""

response = requests.post(endpoint, data = {'query': query})
res = json.loads(response.text)
for item in res['results']['bindings']:
   fileid = item['filename']['value']
   fileresponse = requests.post(wikiapi, params = {
      'action': 'query',
      'prop': 'imageinfo',
      'iiprop' : 'url',
      'format' : 'json',
      'titles' : fileid
   })
   fileres = json.loads(fileresponse.text)
   for pageid in fileres['query']['pages']:
      fileitem = fileres['query']['pages'][pageid]
      fileurl = fileitem['imageinfo'][0]['url']
      print fileurl
      filelib = urllib.URLopener()
      filelib.retrieve(fileurl, os.path.basename(fileurl))

Using Jupyter Notebook

The LinkedEarth team put together a Jupyter Notebook that allows fairly complex queries with no knowledge of coding required. All you need is Python v3+ and Jupyter Notebook installed on your computer. You can download the Notebook here.

See Also

  • Coral Query: Learn how to search the database by archive type