Difference between revisions of "LinkedEarth Ontology"
m (→How useful is the ontology?) |
(→How useful is the ontology?) |
||
Line 9: | Line 9: | ||
Different concepts may be linked to each other using properties. For example, a dataset may contain a data table, which contains several variables. Each of these variables may have a different name and values. The set of properties and concepts used to describe a domain are known as ontologies. An ontology is defined as a [http://iaoa.org/isc2012/docs/Guarino2009_What_is_an_Ontology.pdf "formal specification of a shared conceptualization"]. Ontologies represent consensual knowledge that helps a community describing the concepts of the domain using a common representation. A feature of ontologies is that they are machine readable, i.e., they allow machines understanding the domain in the way the creators of the ontology have defined. Thanks to the ontology, machines can navigate through data and discover data that otherwise would be hidden to them. This enables batch processing of data that would require a large amount of (wo)men hours. | Different concepts may be linked to each other using properties. For example, a dataset may contain a data table, which contains several variables. Each of these variables may have a different name and values. The set of properties and concepts used to describe a domain are known as ontologies. An ontology is defined as a [http://iaoa.org/isc2012/docs/Guarino2009_What_is_an_Ontology.pdf "formal specification of a shared conceptualization"]. Ontologies represent consensual knowledge that helps a community describing the concepts of the domain using a common representation. A feature of ontologies is that they are machine readable, i.e., they allow machines understanding the domain in the way the creators of the ontology have defined. Thanks to the ontology, machines can navigate through data and discover data that otherwise would be hidden to them. This enables batch processing of data that would require a large amount of (wo)men hours. | ||
− | == | + | == Why do we need an ontology?== |
− | + | === Queries === | |
One of the most practical aspects of the ontology is to perform the types of complex queries that scientists do daily. For instance, "Please search the entire database for 10 coral d18O records that have been interpreted to represent temperature". | One of the most practical aspects of the ontology is to perform the types of complex queries that scientists do daily. For instance, "Please search the entire database for 10 coral d18O records that have been interpreted to represent temperature". | ||
Line 45: | Line 45: | ||
'''Remember that no formal knowledge about ontologies is required to use and contribute to the wiki.''' | '''Remember that no formal knowledge about ontologies is required to use and contribute to the wiki.''' | ||
+ | |||
+ | === Wiki Engine === | ||
+ | It may not be obvious to you, but this very wiki is performing a lot of complex tasks that could not happen without this ontology. | ||
+ | |||
+ | === Automation === | ||
==The LinkedEarth Ontology== | ==The LinkedEarth Ontology== |
Revision as of 19:06, 13 April 2017
At its most fundamental level, the LinkedEarth Ontology allows us to not only define terms commonly used to describing a paleoclimate dataset (e.g., variable, uncertainty, calibration) but also to specify the relationship among those terms (e.g., a variable has uncertainty). As such it allows us to make inferences, support complex queries, as well as perform quality control on the data.
When representing the knowledge of a domain like paleoclimatology, we usually distinguish the things that we want to describe (i.e, concepts like a dataset, a variable, etc,...) and the relationships used to describe those concepts (e.g., the name of the dataset, the value of the variable, etc...). As shown in the figure to the right, we can use a graph-based representation to encode the information in a set of triples.
Each triple has a subject (i.e., what we want to describe), a property (the element describing the subject), and an object (i.e., the values used to describe the subject).
Different concepts may be linked to each other using properties. For example, a dataset may contain a data table, which contains several variables. Each of these variables may have a different name and values. The set of properties and concepts used to describe a domain are known as ontologies. An ontology is defined as a "formal specification of a shared conceptualization". Ontologies represent consensual knowledge that helps a community describing the concepts of the domain using a common representation. A feature of ontologies is that they are machine readable, i.e., they allow machines understanding the domain in the way the creators of the ontology have defined. Thanks to the ontology, machines can navigate through data and discover data that otherwise would be hidden to them. This enables batch processing of data that would require a large amount of (wo)men hours.
Contents
Why do we need an ontology?
Queries
One of the most practical aspects of the ontology is to perform the types of complex queries that scientists do daily. For instance, "Please search the entire database for 10 coral d18O records that have been interpreted to represent temperature".
{{ #ask: [[Category:Dataset_©]] [[IncludesPaleoData_©.FoundInMeasurementTable_©.IncludesVariable_©.MeasuredOn_©::<q>[[Category:Coral]]</q>]] [[IncludesPaleoData_©.FoundInMeasurementTable_©.IncludesVariable_©.InterpretedAs_©.Name_©::T]] | ?IncludesPaleoData_©=PaleoData | format=broadtable | limit=10 }}
A quick look at the code above describes the hierarchy of a dataset on the wiki (and in the corresponding LiPD file). In essence, the query asks the database to find the datasets(Category:Dataset_©),
- which include (Property:IncludesPaleoData_©) PaleoData (Category:PaleoData_©),
- that are found in (Property:FoundInMeasurementTable_©) a table (Category:MeasurementTable_©),
- that include ( Property:IncludesVariable_©) variables (Category:Variable_©).
These variables need to fit two criteria:
- to be measured on (Property:MeasuredOn_©) an archive of type coral (Category:Coral)
- and for the variations to have be interpreted as (Property:InterpretedAs_©) temperature (or T)
The sections below describes the difference between properties and categories on the wiki. Relating back to the triples defined above, Special:Categories on the wiki represent concepts and Special:Properties are the properties relating the various categories.
Remember that no formal knowledge about ontologies is required to use and contribute to the wiki.
Wiki Engine
It may not be obvious to you, but this very wiki is performing a lot of complex tasks that could not happen without this ontology.
Automation
The LinkedEarth Ontology
The ontology is organic, meaning that it is designed to "grow" as more and more records are added to the wiki and researchers need to define new terms or redefine existing ones.
The LinkedEarth ontology is divided into several components:
- The LiPD Ontology: As its name indicates, this part of the ontology concerns itself with the formatting of a Dataset and follows the LiPD format very closely. You may have noticed that many of the categories and properties associated with the LiPD ontology are followed by a copyright sign. This sign indicates that the categories and properties are considered "core". Core properties are essential for the development of codes supported by the LiPD, such as the LiPD Utilities, GeoChronR, and Pyleoclim. Therefore, changes to these core categories and properties must be approved by the Editorial Board. To suggest changes to a term in the core ontology, start a discussion on the category or property page. Once a community consensus has been reached, use this form to contact the Editorial Board about the change.
- The Proxy Archive Ontology defines the different categories of archive types used in paleoclimate studies (such as marine sediment, coral,...) following the definition by Evans et al. (2013) [1]. This ontology is the product of an evolving community effort. New proxy archive categories can be automatically created by uploading a LiPD file onto the LinkedEarth wiki or by manually setting the Property:HasProxyArchive_© to the new archive type.
- The Proxy Observation Ontology defines the various proxy observations made on the proxy archives following the definition by Evans et al. (2013) [1]. This ontology is also a product of an evolving community effort. New proxy observation categories can be automatically created by uploading a LiPD file onto the LinkedEarth wiki or by manually setting the Property:HasProxyObservation_© to the new observation type.
- The Proxy Sensor Ontology defines the various types of proxy sensors following the definition of Evans et al. (2013) [1]. This ontology is also a product of an evolving community effort. New proxy sensor categories can be automatically created by uploading a LiPD file onto the LinkedEarth wiki or by manually setting the Property:HasProxySensor_© to the new observation type. If the sensor is not specified in the LiPD file, the wiki is designed to make an educated guess based on the proxy archive and proxy observation.
- The Instrument Ontology aims to define the various types of instruments used to produce the proxy observations. This ontology is also crowd-sourced and new categories of instruments can be added by setting the category page to the new type of instruments.
- The Inferred Variable Ontology aims to provide a taxonomy of the various inferred variables. This ontology is also crowd-sourced and new categories of inferred variables can be added by uploading a LiPD file onto the LinkedEarth wiki or setting the Property:OnInferredVariableProperty to the desired type of inferred variable.