LinkedEarth Ontology

From Linked Earth Wiki
Jump to: navigation, search

At its most fundamental level, the LinkedEarth Ontology allows us to not only define terms commonly used to describing a paleoclimate dataset (e.g., variable, uncertainty, calibration) but also to specify the relationship among those terms (e.g., a variable has uncertainty). As such it allows us to make inferences, support complex queries, as well as perform quality control on the data.

The triple consists of a subject (the dataset), a property (hasName), and an object (the name of the dataset, WesternPacific_Khider_2014).

When representing the knowledge of a domain like paleoclimatology, we usually distinguish the things that we want to describe (i.e, concepts like a dataset, a variable, etc,...) and the relationships used to describe those concepts (e.g., the name of the dataset, the value of the variable, etc...). As shown in the figure to the right, we can use a graph-based representation to encode the information in a set of triples.

Each triple has a subject (i.e., what we want to describe), a property (the element describing the subject), and an object (i.e., the values used to describe the subject).

Different concepts may be linked to each other using properties. For example, a dataset may contain a data table, which contains several variables. Each of these variables may have a different name and values. The set of properties and concepts used to describe a domain are known as ontologies. An ontology is defined as a "formal specification of a shared conceptualization". Ontologies represent consensual knowledge that helps a community describing the concepts of the domain using a common representation. A feature of ontologies is that they are machine readable, i.e., they allow machines understanding the domain in the way the creators of the ontology have defined. Thanks to the ontology, machines can navigate through data and discover data that otherwise would be hidden to them. This enables batch processing of data that would require a large amount of (wo)men hours.

Why do we need an ontology?

Queries

One of the most practical aspects of the ontology is to perform the types of complex queries that scientists do daily. For instance, "Please search the entire database for 10 coral records that have been interpreted to represent temperature".

{{ #ask: 
[[Category:Dataset_(L)]] 
[[IncludesPaleoData_(L).FoundInMeasurementTable_(L).IncludesVariable_(L).HasProxySystem_(L).ProxyArchiveType_(L)::Coral]]
[[IncludesPaleoData_(L).FoundInMeasurementTable_(L).IncludesVariable_(L).InterpretedAs_(L).Name_(L)::T]]
 | ?IncludesPaleoData_(L)=PaleoData
 | format=broadtable
 | limit=10
}}


A quick look at the code above describes the hierarchy of a dataset on the wiki (and in the corresponding LiPD file). In essence, the query asks the database to find the datasets(Category:Dataset_(L)),

  1. which include (Property:IncludesPaleoData_(L)) PaleoData (Category:PaleoData_(L)),
  2. that are found in (Property:FoundInMeasurementTable_(L)) a table (Category:MeasurementTable_(L)),
  3. that include ( Property:IncludesVariable_(L)) variables (Category:Variable_(L)).

These variables need to fit two criteria:

  1. to be measured on (Property:MeasuredOn_(L)) an archive of type coral (Category:Coral)
  2. and for the variations to have be interpreted as (Property:InterpretedAs_(L)) temperature (or T)

The sections below describes the difference between properties and categories on the wiki. Relating back to the triples defined above, Special:Categories on the wiki represent concepts and Special:Properties are the properties relating the various categories.

Remember that no formal knowledge about ontologies is required to use and contribute to the wiki.

The Ontology as fuel for the Wiki Engine

It may not be obvious, but this wiki is performing many complex tasks that could not happen without the Linked Earth ontology. The ontology is used for visualizing datasets in maps, helping searching for structured content and curating the current existing articles by guiding which metadata is incomplete.

The ontology is also used by external programs. In our latest release, the ontology has been mapped to vocabularies like Schema.org, which facilitate its understanding to search engines. And by accepting complex queries to retrieve dataset metadata, we can facilitate querying all the information stored in the wiki by scientists.

Automation

We are in the process of creating bots for helping automating some of the manual curation processes undertaken by users in the wiki. For example, even though the wiki helps describing the datasets uploaded by users, sometimes there are errors (identifiers may not follow a convention, units may be expressed inconsistently, etc.). Bots can check the consistency of identifiers, detect duplicate pages, etc. and propose automated solutions.

Bots can also help in the creation of daily statistics that show how the wiki grows in time, or how the authors collaborate to each other. We also plan to use them to create weekly summaries of contributions and track the proposed changes to terms in the ontology.

The LinkedEarth Ontology

The ontology is organic, meaning that it is designed to "grow" as more and more records are added to the wiki and researchers need to define new terms or redefine existing ones.

The LinkedEarth ontology is divided into several components:

  • The LiPD Ontology: As its name indicates, this part of the ontology concerns itself with the formatting of a Dataset and follows the LiPD format very closely. You may have noticed that many of the categories and properties associated with the LiPD ontology are followed by (L). This sign indicates that the categories and properties are considered "core" and part of the LiPD framework (hence the L). Core properties are essential for the development of codes supported by the LiPD, such as the LiPD Utilities, GeoChronR, and Pyleoclim. Therefore, changes to these core categories and properties must be approved by the Editorial Board. To suggest changes to a term in the core ontology, start a discussion on the category or property page. Once a community consensus has been reached, use this form to contact the Editorial Board about the change.
  • The Proxy Archive Ontology defines the different categories of archive types used in paleoclimate studies (such as marine sediment, coral,...) following the definition by Evans et al. (2013) [1]. This ontology is the product of an evolving community effort. New proxy archive categories can be automatically created by uploading a LiPD file onto the LinkedEarth wiki or by manually setting the Property:ProxyArchiveType_(L) to the new archive type.
  • The Proxy Observation Ontology defines the various proxy observations made on the proxy archives following the definition by Evans et al. (2013) [1]. This ontology is also a product of an evolving community effort. New proxy observation categories can be automatically created by uploading a LiPD file onto the LinkedEarth wiki or by manually setting the Property:ProxyObservationType_(L) to the new observation type.
  • The Proxy Sensor Ontology defines the various types of proxy sensors following the definition of Evans et al. (2013) [1]. This ontology is also a product of an evolving community effort. New proxy sensor categories can be automatically created by uploading a LiPD file onto the LinkedEarth wiki or by manually setting the Property:ProxySensorType_(L) to the new observation type. If the sensor is not specified in the LiPD file, the wiki is designed to make an educated guess based on the proxy archive and proxy observation.
  • The Instrument Ontology aims to define the various types of instruments used to produce the proxy observations. This ontology is also crowd-sourced and new categories of instruments can be added by setting the category page to the new type of instruments.
  • The Inferred Variable Ontology aims to provide a taxonomy of the various inferred variables. This ontology is also crowd-sourced and new categories of inferred variables can be added by uploading a LiPD file onto the LinkedEarth wiki or setting the Property:InferredVariableType_(L) to the desired type of inferred variable.

Editing the LinkedEarth ontology

Terms contributed to the crowd ontology are immediately added to the wiki and are usable by the community at large. In the event that a crowd term needs to be incorporated into the core ontonlogy (terms followed by a (L) on the wiki) or if a core term needs to be modified, please use this form to make the request to the Editorial Board. The full procedure explaining how a change is being implemented by the Editorial Board is described in this document.

See Also

References

  1. 1.0 1.1 1.2 Evans, M. N., Tolwinski-Ward, S. E., Thompson, D. M., & Anchukaitis, K. J. (2013). Applications of proxy system modeling in high resolution paleoclimatology. Quaternary Science Reviews, 76, 16-28. doi:10.1016/j.quascirev.2013.05.024