Difference between revisions of "Querying the Datasets"

( Pages with syntax highlighting errors )

Revision as of 17:51, 22 June 2016

1 Semantic Media Wiki Queries
2 SPARQL Queries
- 2.1 Get Datasets that have paleo data based on d18O Proxy ( along with file names of the csv files )
- 2.2 Get Datasets that have inferred variables with an uncertainty less than 0.5
3 Querying Linked Earth Data from another Program/Script
- 3.1 Fetch all dataset names from the wiki
- 3.2 Fetch all dataset csv files given a dataset name

Semantic Media Wiki Queries

The data in the wiki can be queried and embedded within the wiki by Semantic Media Wiki (SMW) queries. The queries refer to the Special:Categories and Special:Properties defined in the wiki.

Example: Get a List of Datasets (limit to 5)

{{ #ask: [[Category:Dataset_©]]
 | mainlabel=Datasets
 | format=broadtable
 | limit=5
}}

Example: Get a List of Datasets that have paleo data based on d18O Proxy (limit 5)

{{ #ask: [[Category:Dataset_©]] [[IncludesPaleoData_©.BasedOn_©.Name_©::d18O]]
 | ?IncludesPaleoData_©=PaleoData
 | format=broadtable
 | limit=5
}}

Example: Get a List of Datasets with archive type "Sclerosponge" and plot them on a map

{{#ask: [[Category:Location_©]] [[CoordinatesFor.archiveType::Sclerosponge]]
 | ?Coordinates
 | ?CoordinatesFor
 | ?Name_©
 | showtitle=off
 | maxzoom=14
 | minzoom=1
 | limit=500
 | template=LiPDLocation
 | format=leaflet
}}

SPARQL Queries

One can also make more complex queries using SPARQL to the wiki's triple store. The SPARQL endpoint is http://wiki.linked.earth/store/ds/query, and one can make queries by passing a query parameter with the text of the SPARQL query. The data can be returned in a variety of formats. The SPARQL queries refer to the Linked Earth core ontology terms found at http://linked.earth/ontology.

Note: The mapping between terms on the wiki and the ontology can be found at any wiki Property or Category page by looking at the "Imported from" value. For example, the property Property:ArchivedIn_© imports the term core:archivedIn from the ontology, where "core:" prefix refers to the linked earth ontology at http://linked.earth/ontology

Get Datasets that have paleo data based on d18O Proxy ( along with file names of the csv files )

PREFIX core: <http://linked.earth/ontology#>
SELECT ?s ?pd ?table ?file
WHERE {
  ?s a core:Dataset .
  ?s core:includesPaleoData ?pd .
  ?pd core:basedOn ?proxy .
  ?proxy core:name "d18O" .
  ?pd core:foundInMeasurementTable ?table .
  ?table core:hasFileName ?file
}

The following is a live URL that queries the SPARQL endpoint with the above query:

http://wiki.linked.earth/store/ds/query?query=PREFIX+core%3A+%3Chttp%3A%2F%2Flinked.earth%2Fontology%23%3E%0ASELECT+%3Fs+%3Fpd+%3Ftable+%3Ffile%0AWHERE+%7B%0A++%3Fs+a+core%3ADataset+.%0A++%3Fs+core%3AincludesPaleoData+%3Fpd+.%0A++%3Fpd+core%3AbasedOn+%3Fproxy+.%0A++%3Fproxy+core%3Aname+%22d18O%22+.%0A++%3Fpd+core%3AfoundInMeasurementTable+%3Ftable+.%0A++%3Ftable+core%3AhasFileName+%3Ffile%0A%7D

Get Datasets that have inferred variables with an uncertainty less than 0.5

PREFIX core: <http://linked.earth/ontology#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?ds ?data ?v ?val
WHERE {
  ?v a core:InferredVariable .
  ?v core:calibratedWith ?calibration .
  ?calibration core:hasUncertainty ?unc .
  ?unc core:hasValue ?val .
  FILTER (xsd:double(?val) < 0.5) .
  ?ds a core:Dataset .
  ?v core:foundInTable ?table .
  ?data core:foundInMeasurementTable ?table .
  ?ds core:includesPaleoData ?data
}

The following is a live URL that queries the SPARQL endpoint with the above query:

http://wiki.linked.earth/store/ds/query?query=PREFIX+core%3A+%3Chttp%3A%2F%2Flinked.earth%2Fontology%23%3E%0APREFIX+xsd%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0ASELECT+%3Fds+%3Fdata+%3Fv+%3Fval%0AWHERE+%7B%0A++%3Fv+a+core%3AInferredVariable+.%0A++%3Fv+core%3AcalibratedWith+%3Fcalibration+.%0A++%3Fcalibration+core%3AhasUncertainty+%3Func+.%0A++%3Func+core%3AhasValue+%3Fval+.%0A++FILTER+(xsd%3Adouble(%3Fval)+%3C+0.5)+.%0A++%3Fds+a+core%3ADataset+.%0A++%3Fv+core%3AfoundInTable+%3Ftable+.%0A++%3Fdata+core%3AfoundInMeasurementTable+%3Ftable+.%0A++%3Fds+core%3AincludesPaleoData+%3Fdata%0A%7D

Querying Linked Earth Data from another Program/Script

Since the endpoint at http://wiki.linked.earth/store/ds/query also allows one to make queries remotely, one can make queries programmatically from their programs and scripts using whichever language they are comfortable in.

Fetch all dataset names from the wiki

import json
import requests

url = "http://wiki.linked.earth/store/ds/query"

query = """PREFIX core: <http://linked.earth/ontology#>
PREFIX wiki: <http://wiki.linked.earth/Special:URIResolver/>
SELECT ?ds ?name
WHERE {
  ?ds a core:Dataset .
  ?ds core:name ?name
}"""

response = requests.post(url, data = {'query': query})
res = json.loads(response.text)

for item in res['results']['bindings']:
	print (item['name']['value'])

Fetch all dataset csv files given a dataset name

import os
import sys
import json
import requests
import urllib

datasetname = sys.argv[1]

endpoint = "http://wiki.linked.earth/store/ds/query" # Query Metadata
wikiapi = "http://wiki.linked.earth/wiki/api.php" # Fetch file data

query = """PREFIX core: <http://linked.earth/ontology#>
PREFIX wiki: <http://wiki.linked.earth/Special:URIResolver/>
SELECT ?data ?filename
WHERE {
  wiki:""" + datasetname + """ core:includesPaleoData ?data .
  ?data core:foundInMeasurementTable ?table .
  ?table core:hasFileName ?filename
}"""

response = requests.post(endpoint, data = {'query': query})
res = json.loads(response.text)
for item in res['results']['bindings']:
   fileid = item['filename']['value']
   fileresponse = requests.post(wikiapi, params = {
      'action': 'query',
      'prop': 'imageinfo',
      'iiprop' : 'url',
      'format' : 'json',
      'titles' : fileid
   })
   fileres = json.loads(fileresponse.text)
   for pageid in fileres['query']['pages']:
      fileitem = fileres['query']['pages'][pageid]
      fileurl = fileitem['imageinfo'][0]['url']
      print fileurl
      filelib = urllib.URLopener()
      filelib.retrieve(fileurl, os.path.basename(fileurl))

@@ Line 109: / Line 109: @@
 ==== Fetch all dataset names from the wiki ====
-<pre>
+<syntaxhighlight lang="python">
 import json
 import requests
@@ Line 128: / Line 128: @@
 for item in res['results']['bindings']:
 	print (item['name']['value'])
-</pre>
+</syntaxhighlight>
 ==== Fetch all dataset csv files given a dataset name ====
-<pre>
+<syntaxhighlight lang="python">
 import os
 import sys
@@ Line 170: / Line 170: @@
        filelib = urllib.URLopener()
        filelib.retrieve(fileurl, os.path.basename(fileurl))
-</pre>
+</syntaxhighlight>

Difference between revisions of "Querying the Datasets"

Revision as of 17:51, 22 June 2016

Contents

Semantic Media Wiki Queries

Example: Get a List of Datasets (limit to 5)

Example: Get a List of Datasets that have paleo data based on d18O Proxy (limit 5)

Example: Get a List of Datasets with archive type "Sclerosponge" and plot them on a map

SPARQL Queries

Get Datasets that have paleo data based on d18O Proxy ( along with file names of the csv files )

Get Datasets that have inferred variables with an uncertainty less than 0.5

Querying Linked Earth Data from another Program/Script

Fetch all dataset names from the wiki

Fetch all dataset csv files given a dataset name

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Getting Started

Community Activities

Current Working Groups

Datasets

Tools