Difference between revisions of "Best Practices"

From Linked Earth Wiki
Jump to: navigation, search
(Third Party contributors: Draft section)
(Nomenclature: Finish nomenclature advice)
 
(36 intermediate revisions by the same user not shown)
Line 20: Line 20:
 
| What constitutes a [[:Category:DataTable_(L) | data table]]? || See [[Creating a LiPD file#What constitutes a measurement table? | this page]].
 
| What constitutes a [[:Category:DataTable_(L) | data table]]? || See [[Creating a LiPD file#What constitutes a measurement table? | this page]].
 
|-
 
|-
| Updating datasets following a compilation||  
+
| Updating datasets following a compilation|| This section
 
|-
 
|-
| Updating datasets following the creation of a new model output ||  
+
| Updating datasets following the creation of a new model output || This section
 
|-
 
|-
| Updating datasets following the creation of new raw measurements ||  
+
| Updating datasets following the creation of new raw measurements || This section
 
|}
 
|}
  
=== New Datasets ===
+
===New vs legacy datasets===
 +
 
 +
''New [[:Category:Dataset (L) | datasets]]'' are datasets that have recently been published and are often contributed by the original [[:Property:Contributor (L) | contributor]] of the study or someone closely associated with the creation of the datasets. This definition also includes older [[:Category:Dataset (L) | datasets]] that the PI may have placed on other public databases or have not come around to upload anywhere yet. In this instance, the [[:Property:Contributor (L) | contributors]] and the LinkedEarth member uploading the [[:Category:Dataset (L) | dataset]] may be the same.Therefore, most of the metadata fields can be filled by the person who was involved in the study since he/she might have the information readily available.
 +
 
 +
''Legacy [[:Category:Dataset (L) | datasets]]'' are datasets that are publicly available (i.e., either on another database or published under U.S. funding) and are contributed by a LinkedEarth member not originally involved in the creation of the [[:Category:Dataset (L) | dataset]]. For datasets that are not publicly available (i.e., emailed directly to the LinkedEarth member by the original [[:Property:Contributor (L) | contributors]]), we recommend informing the [[:Property:Contributor (L) | contributors]] of your intent to upload their [[:Category:Dataset (L) | dataset]] on the LinkedEarth wiki.
 +
 
 +
The guidelines suggested below apply to both new and legacy [[:Category:Dataset (L) | datasets]].
 +
 
 +
=== Versioning system ===
 +
 
 +
One of the properties of a [[:Category:Dataset (L) | dataset]] is the [[:Property:DatasetVersion (L) | dataset version]]. In LinkedEarth, the [[:Property:DatasetVersion (L) | dataset version]] follows the x.y.z notation where:
 +
* x refers to changes in metadata and data following a publication. Examples of such changes include the creation of a new age model as part of a compilation or comparison or changes in the way a [[:Category:MeasuredVariable (L) | measured variable]] is [[:Property:CalibratedVia (L) | calibrated]] to obtain an [[:Category:InferredVariable (L) | inferred variable]] (i.e. applying a different [[:Category:CalibrationModel (L) | calibration model]]).
 +
* y refers to changes to the data following a publication. Examples include adding data further back in time without changing the [[:Category:Model (L) | model]] underlying the [[:Category:Interpretation (L) | interpretation]].
 +
* z refers to changes not associated with a publication and includes typos, addition of metadata either lifted from the [[:Category:Publication (L) | publication]] or from the original [[:Property:Contributor (L) | contributor]] of the data (e.g., information from a laboratory notebook).
 +
 
 +
After the initial  [[Special:WTLiPD | upload]], set the [[:Property:DatasetVersion (L) | dataset version]] to '0.0.0'.
 +
 
 +
'''Note''': The [[:Property:DatasetVersion (L) | dataset version]] is different from the [[:Property:CompilationVersion (L) | compilation version]]. The versioning system of each [[:Category:Compilation (L) | compilation]] is left at the discretion of the group who created the [[:Category:Compilation (L) | compilation]] but should be explained on the [[:Category:Compilation (L) | compilation]] page.
 +
 
 +
=== Uploading a dataset for the first time on the wiki ===
 
{{See also|Creating a LiPD file}}
 
{{See also|Creating a LiPD file}}
  
 
We '''strongly''' recommend first creating a [[Linked Paleo Data | LiPD file]] rather than entering all the data and metadata from scratch on the wiki. As of April 2017, the most expeditious way to convert your data into the [[Linked Paleo Data | LiPD]] format is to use our Excel Template ([[:File:LiPDv1.2 template.xlsx]]) and the Python [[LiPD Utilities]]. This [[Creating a LiPD file | guide]] will assist you in entering the necessary data and metadata information.  
 
We '''strongly''' recommend first creating a [[Linked Paleo Data | LiPD file]] rather than entering all the data and metadata from scratch on the wiki. As of April 2017, the most expeditious way to convert your data into the [[Linked Paleo Data | LiPD]] format is to use our Excel Template ([[:File:LiPDv1.2 template.xlsx]]) and the Python [[LiPD Utilities]]. This [[Creating a LiPD file | guide]] will assist you in entering the necessary data and metadata information.  
  
If you decide to enter a dataset manually (yes, it is a possible although '''extremely painful''' process):
+
Once your [[:Category:Dataset (L) | dataset]] is in [[Linked Paleo Data | LiPD]] format, you can [[Special:WTLiPD | upload]] it on the wiki. This will automatically create most of the pages. Check that all the information is correct and once satisfied, update the [[:Property:DatasetVersion (L) | dataset version]] to '0.0.0'.
 +
 
 +
If you decide to enter a dataset manually (not recommended):
 
# Upload your data in csv format using the ''''Upload File'''' link in the sidebar. Make sure you name them appropriately by referring to the [[#Nomenclature | nomenclature section]] on this page. The wiki will suggest names for you to use.  
 
# Upload your data in csv format using the ''''Upload File'''' link in the sidebar. Make sure you name them appropriately by referring to the [[#Nomenclature | nomenclature section]] on this page. The wiki will suggest names for you to use.  
# [[Quick Guide to Editing Wiki Pages#Creating a new wiki page | Create a new page]] using the name ''SiteName.DatasetYear.ContributorName'' and set the Category of the new page to [[:Category:Dataset (L)]]. '''Note''': To be able to create a page, you need to enter some text in the WikiText box. You'll be able to delete it the page create by clicking on ''edit'' at the top of the page.
+
# [[Quick Guide to Editing Wiki Pages#Creating a new wiki page | Create a new page]] using the name ''SiteName.DatasetYear.ContributorName'' and set the Category of the new page to [[:Category:Dataset (L)]]. '''Note''': To be able to create a page, you need to enter some text in the WikiText box. You'll be able to delete this extra text from the page after you create it by clicking on ''edit'' at the top of the page.
 
# The wiki will automatically suggest standard properties. Answer as many as possible. '''Note''': If the answer to a Property results in the creation of a new class (i.e., the box doesn't specify text or number), then you'll be essentially creating a new wiki page. Follow our [[#Nomenclature | nomenclature]]. If you make a typo, just fixing the typo in the link will not automatically redirect the page. The best approach is to [[Quick Guide to Editing Wiki Pages#Renaming wiki pages | rename the landing page]].
 
# The wiki will automatically suggest standard properties. Answer as many as possible. '''Note''': If the answer to a Property results in the creation of a new class (i.e., the box doesn't specify text or number), then you'll be essentially creating a new wiki page. Follow our [[#Nomenclature | nomenclature]]. If you make a typo, just fixing the typo in the link will not automatically redirect the page. The best approach is to [[Quick Guide to Editing Wiki Pages#Renaming wiki pages | rename the landing page]].
  
=== Existing Datasets ===
+
=== Changes to a dataset already on the wiki ===
  
For existing datasets, we recommend updating the data and metadata directly on the wiki rather than uploading a new LiPD file. Please follow the guidelines below for [[#Original contributors | original contributors]] and [[#Third Party contributors | third party contributors]].
+
For existing datasets, we recommend updating the data and metadata directly on the wiki rather than uploading a new LiPD file.
  
One major exception to this rule is an update that requires adding a new variable (i.e., adding a column to a csv file) to an '''existing''' [[:Category:DataTable (L) | data table]] for the original contributor of the dataset.  
+
All changes to a dataset after the initial upload requires a change in the version of the file as outlined [[#Versioning system | here]]. If you are planning to make a series of updates over the course of several days as part of the same work, only update the [[:Property:DatasetVersion (L) | dataset version]] once you're trough with all the changes.  
  
*''To update data'':
+
==== Changing existing data ====
It may be require to update a new csv file to fix a minor typo in the data.
+
  
==== Versioning system ====
+
[[File:BestPractices Download csv file.png|thumb|right|400px|Downloading a csv file from the wiki]]
 +
[[File:BestPractices upload NewVersion.png|thumb|right|400px|Uploading a new version of a file on the wiki]]
  
==== Original contributors ====
+
Only the original [[:Property:Contributor (L) | contributor]] to the data and the person uploading the [[:Category:Dataset (L) | dataset]] can override the original csv file.
  
You can update your dataset at any times on the wiki. Just follow the [[#versioning system | versioning]] and [[#Nomenclature | nomenclature]] rules.
+
If the change requires creating another column or changing the underlying [[:Category:CalibrationModel (L) | calibration]], you should follow the instruction on adding data tables.
  
==== Third Party contributors ====
+
To update data:
 +
# go to [[Special:Listfiles | this page]] and search for the name of the csv file you need to update.
 +
# Download the contributed csv file onto your computer by right-clicking on the name
 +
# Make the necessary corrections to the file and save it, '''using the same file name'''
 +
# To re-upload to the wiki, go back to file page from which you originally downloaded the file.
 +
# Click on ''Upload a new version of this file'' at the bottom of the page.
  
Anyone with basic editor privilege can edit wiki pages. Below are suggested etiquette rules for editing [[:Category:Dataset (L) | datasets]] that you haven't originally contributed. If you're concerned about your own dataset, please remember that you can add these pages to [[:Special:Watchlist | your watchlist]] and receive an email every time an update is being made.
+
==== Changing existing metadata ====
+
 
*'''Typo''': The power of the wiki is to be able to fix minor mistakes on the fly. If you see a typo, correct it and update the [[:Property:DatasetVersion (L) | dataset version]] using these [[#versioning system | rules]]. Examples of typos include, misspelling a standardized term. These include entering "MgCa" instead of "Mg/Ca" or "T" instead of "Temperature". See the [http://linked.earth/ontology/archive/1.0.0/index-en.html Proxy Archive Ontology], the [http://linked.earth/ontology/observation/1.0.0/index-en.html Proxy Observation Ontology], the [http://linked.earth/ontology/sensor/1.0.0/index-en.html Proxy Sensor Ontology], the [http://linked.earth/ontology/inferredVariable/1.0.0/index-en.html Inferred Variable Ontology], and the [http://linked.earth/ontology/instrument/1.0.0/index-en.html Instrument Ontology] for details.  
+
The LinkedEarth wiki is mean as a collaborative platform for the curation of paleoclimate data. As such, anyone with basic editor privilege (i.e. a LinkedEarth member) can edit wiki pages.  
* '''Changes to the originally-contributed data''' (i.e., the data stored in the .csv files).  
+
 
By default, only the original [[:Property:Contributor (L) | contributor]] and the person who uploaded the data on the wiki can overwrite existing csv file. If you spot a mistake, first contact these persons. If you do not get an answer after '''14 business days''', [mailto:linkedearth@gmail.com contact] the [http://linked.earth/aboutus/team-members-page/ LinkedEarth team].
+
If you're concerned about changes to your own [[:Category:Dataset (L) | dataset]], please remember that you will receive a notification email when the pages are updated by another member of the community. If you do not agree with the changes being made to your [[:Category:Dataset (L) | dataset]], we suggest that:
* '''Changes to already existing metadata''':
+
# Contact the LinkedEarth member, who has made the change using the [[Discussion Page Tutorial | discussion page]] for the wiki page.  
Please contact the person who made the contribution on the wiki. If you do not get an answer after '''14 business days''', [mailto:linkedearth@gmail.com contact] the [http://linked.earth/aboutus/team-members-page/ LinkedEarth team].
+
# If you do not receive an answer within '''7 business days''', try contacting the user by email.
* '''Adding new metadata, including if used in a compilation''':
+
# If you cannot resolve the issue within '''30 business days''', [mailto:linkedearth@gmail.com contact us].
If you wish to add new metadata to an existing [[:Category:Dataset (L) | dataset]], do so directly on the wiki.
+
 
 +
Remember that these changes could be as simple as typos and maybe done automatically by the LinkedEarth team to bring your dataset up-to-date with the current ontology. See the [http://linked.earth/ontology/archive/1.0.0/index-en.html Proxy Archive Ontology], the [http://linked.earth/ontology/observation/1.0.0/index-en.html Proxy Observation Ontology], the [http://linked.earth/ontology/sensor/1.0.0/index-en.html Proxy Sensor Ontology], the [http://linked.earth/ontology/inferredVariable/1.0.0/index-en.html Inferred Variable Ontology], and the [http://linked.earth/ontology/instrument/1.0.0/index-en.html Instrument Ontology] for details.
 +
 
 +
==== Adding metadata only ====
 +
 
 +
You can add metadata easily on the wiki. The addition of metadata does not necessarily have to follow a [[:Category:Publication (L) | publication]]. For instance, one LinkedEarth member can upload a legacy dataset in May 2017. In October 2017, another member, perhaps more familiar with the study, may add further information. As previously mentioned, the member, who originally updated the dataset, will receive an email that these pages have been changed. We anticipate such changes when not all the information, especially pertaining to [[:Category:Instrument (L) | instrumentation]]. [[:Category:Uncertainty (L) | uncertainty]], or the [[:Category:PhysicalSample (L) | physical sample]], were available in the original manuscript.
 +
 
 +
Another example can involve the same [[:Property:Contributor (L) | contributor]] uploading the minimal required data and metadata in the haste to meet the journal requirements and deadlines and decide months later to add the recommended and desired metadata.
 +
 
 +
==== Adding data ====
 +
 
 +
'''Note:''' Adding data will automatically create new metadata.
 +
 
 +
If the updates involve extensive changes to [[:Category:MeasuredVariable (L) | measured variables]], [[:Category:InferredVariable (L) | inferred variables]], [[:Category:Model (L) | models]] (e.g, age model, calibration) following a new [[:Category:Publication (L) | publication]], we recommend creating a new LiPD file, and consequently a new [[:Category:Dataset (L) | dataset page]]. The new [[:Category:Dataset (L) | dataset]] should contain information relating to the first [[:Category:Publication (L) | publication]] and a [[:Property:notes (L) | note]] explaining the relationships between the two [[:Category:Dataset (L) | datasets]].
 +
 
 +
==== Adding Ensemble, distribution and summary tables ====
 +
 
 +
[[File:TutorialFig23.png | thumb | right | 400px | Uploading a .csv file on the wiki]]
 +
 
 +
[[File:WikiTutorial AddingAnEnsembleTable.png | thumb | right | 400 px | Adding an ensemble table page to the wiki]]
 +
 
 +
[[File:WikiTutorial AddVariables.png | thumb | right | 400px | Location of the 'Provide Data Name' button on the wiki page.]]
 +
 
 +
[[File:WikiTutorial LinkingCSVFileToDataTablePage.png | thumb | right | 400px | Enter the name of the Excel file]]
 +
 
 +
[[File:WikiTutorial DataTablePage.png | thumb | right | 400px | Completed EnsenbleDataTable page]]
 +
 
 +
If a Bayesian computation was used to derive the inferred variables (an increasingly popular method for age modeling thanks to packages such as [https://cran.r-project.org/web/packages/Bchron/vignettes/Bchron.html Bchron] and [http://www.chrono.qub.ac.uk/blaauw/bacon.html Bacon]]), there might be as many as three tables that can be generated:
 +
* [[:Category:EnsembleTable (L) | Ensemble tables]] are used to store members of the ensemble. We recommend not storing more than 1000 individual members on the wiki.
 +
* [[:Category:SummaryTable (L) | Summary tables]] allow to store statistics of the ensemble such as the median, quartiles, and quantiles.
 +
* [[:Category:DistributionTable (L) | Distribution tables]] allows to store the output distribution for a specific horizon. For instance, the calendar age distribution at a specific horizon where radiocarbon was measured.
 +
 
 +
To add one of these tables to an existing datasets:
 +
#Save the values in .csv format with depth in the first column. Name the file according to the [[#Nomenclature | nomenclature below]]. Upload your file by clicking on 'Upload File' in the left sidebar. The wiki should automatically re-direct you to the page dedicated to this particular file on the wiki. Keep this page open, you'll need it for step 6.
 +
#Look for the variable the ensemble applies to. In the example in the figures, the ensemble table contains realizations of [[Sea Surface Temperature]] series that should be linked to [[PYT7DGGKKES.sst | this variable]].
 +
# Create a new entry for property [[:Property:FoundInTable (L) | FoundInTable (L)]] by clicking on the previous and enter the name of the ensemble table following the [[#Nomenclature | nomenclature]].
 +
# Navigate to the page
 +
## Create the model page using the [[:Property:GeneratedByModel (L) | GeneratedByModel (L)]] and following the proper [[#Nomenclature | nomenclature]].
 +
## Create a page for each variable using the proper [[#Nomenclature | nomenclature]]. For ensemble tables, there should only be two variables: depth stored in the first column and the variable corresponding to each ensemble numbers in columns 2-N, where N represents the number of realizations. See [[TestDataset.Paleo1.Model1.EnsembleTable1 | this page]] for an example.
 +
# To link the .csv file to the metadata, click on 'Provide Data Name' at the top of the page.
 +
# Enter the name of the csv file, as shown at the top of the file page after upload (ignoring the word 'file'), and click go.
  
 
==Compilation==
 
==Compilation==
  
== Nomenclature ==
+
The [[:Property:PartOfCompilation (L) | PartofCompilation Property]] can be used to link a particular [[:Category:Dataset (L) | dataset]] to a [[:Category:Compilation (L) | compilation]], in which the [[:Category:Dataset (L) | dataset]] has been used (e.g., the [[PAGES2k]] compilation). To identify the particular variable used in the compilation, you can add a property to this particular variable to signal it's been used in the compilation (e.g., the [[PAGES2k]] consortium used the property [[:Property:UseInPAGES2kGlobalTemperatureAnalysis | UseInPAGES2kGlobalTemperatureAnalysis]] to identify the specific variable).
  
=== Dataset Name ===
+
===Compilation Page ===
  
The naming convention for datasets is SiteName.FirstAuthor.PubYear. For instance, for this [[MD982181.Khider.2014 | dataset]], the SiteName is MD98-2181 (the name of the marine sediment core), the first author is Khider and the dataset was first published in 2014: MD982181.Khider.2014 as show on Figure 1.
+
The [[:Category:Compilation (L) | compilation]] page has standards properties:
 +
* [[:Property:Author (L) | The author]] of the [[:Category:Compilation (L) | compilation]]. An author can be a consortium (e.g., PAGES2k consortium)
 +
* [[:Property:FundedBy (L) | Funding information]]
 +
* [[Property:ModeledBy (L) | The output]] of the [[:Category:Compilation (L) | compilation]]. The [[:Category:Model (L) | model category]] allows to store various tables, including [[:Category:EnsembleTable (L) | ensemble tables]] and [[:Category:SummaryTable (L) | summary tables]].
 +
* [[:Property:CompilationCitation (L) | A citation]] for the [[:Category:Compilation (L) | compilation]]. This is difference from the [[:Category:Publication (L) | references]], which can be entered by adding an extra property [[:Property:PublishedIn (L) | PublishedIn (L)]].
 +
* [[:Property:CompilationDate (L) | The date]] at which the [[:Category:Compilation (L) | compilation]] was published.
 +
* [[:Property:CompilationVersion (L) | The version]] of the [[:Category:Compilation (L) | compilation]]. The versioning scheme is let at the discretion of the [[:Property:Author (L) | authors]] contributing the work but should be explained on the page (for instance, see the [[PAGES2k]] page.).
  
[[File:DatasetName.png|400px|thumb|right|DatasetName at the top of a Dataset Page on the LinkedEarth wiki.]]
+
=== Compilation Products ===
  
If uploading from a LiPD file, the dataset name should be automatically filled out for you.
+
The products of a compilation (for instance, the benthic d18O stack for [[LR04 benthic stack]] or [[Prob-stack]]) can be stored directly on the wiki by uploading a text or excel file or externally. To link to the file on the wiki or an external database, use the [[:Property:HasLink (L) | HasLink (L)]] property.  
  
Following this convention from the start is important since many of the pages are named after the DatasetName.
+
Alternatively, the results can be stored in [[:Category:EnsembleTable (L) | ensemble tables]] or [[:Category:SummaryTable (L) | summary tables]] and linked accoringly.
  
=== Publication ===
+
== Nomenclature ==
  
Publications are identified on the wiki using their DOI. For instance, the Anand et al. (2003) publication corresponds to the page [[Publication.10.1029/2002PA000846]].
+
Pages on the wiki sometimes require specific name as to ensure that a unique URL while others can be common to several datasets. For instance, when referring to the measurements stored in one of the column of the csv file, the page name needs to be unique to that dataset. On the other hand, the type of measurements (for instance, [[D18O | &delta;<sup>18</sup>O]]), the page needs to be common to all datasets using the concept of [[D18O | &delta;<sup>18</sup>O]] measurements.
 +
 
 +
For this reason, we implemented a guide to name wiki pages if you want to contribute a [[:Category:Dataset (L) | dataset]] directly on the wiki without going through a [[Linked Paleo Data | LiPD]] file (not recommended) or want to add information to an existing [[:Category:Dataset (L) | dataset]] (for instance, adding information about [[:Category:Uncertainty (L) | uncertainty]], the [[:Category:PhysicalSample (L) | physical sample]], or the [[:Category:Interpretation (L) | interpretation]]).
 +
 
 +
If you need help editing a dataset or unsure about the propert nomenclature, [mailto:linkedearth@gmail.com contact us].  
 +
 
 +
Remember that (L) refers to a "core" or LiPD category/property in the [[LinkedEarth Ontology]]. As such, these categories and properties cannot be changed by basic editors on the wiki.
 +
 
 +
{| class="wikitable"
 +
|-
 +
|+Guide to name wiki pages
 +
|-
 +
! Category || Property Linking to Category || Suggested Name || Example
 +
|-
 +
| [[:Category:Dataset (L) | Dataset (L)]] || N/A || SiteName.Year.FirstAuthor || [[MD982176.Stott.2004]]
 +
|-
 +
| [[:Category:Location (L) | Location (L)]] || [[:Property:CollectedFrom (L) | CollectedFrom (L)]] || DatasetName.Location || [[MD982176.Stott.2004.Location]]
 +
|-
 +
| [[:Category:Person (L) | Person (L) ]] || [[:Property:Contributor (L) | Contributor (L)]], [[:Property:Author (L) | Author (L) ]] || FirstInitial. MiddleInitial. LastName || [[L. Stott]]
 +
|-
 +
| [[:Category:Publication (L) | Publication (L)]] || [[:Property:PublishedIn (L) | PublishedIn (L)]] || DOI available: Publication.doi
 +
DOI unavailable: Publication.Dataset 
 +
|| [[Publication.10.0138/nature02903]]
 +
|-
 +
| [[:Category: Funding (L) | Funding (L) ]] || [[:Property:FundedBy (L) | FundedBy (L)]] || Agency.GrantNumber || [[National_Science_Foundation.AGS#1049238]]
 +
|-
 +
| [[:Category:ChronData (L) | ChronData (L)]]
 +
OR
 +
[[:Category:PaleoData (L) | PaleoData (L)]]
 +
|| [[:Property:IncludesChronData (L) | IncludesChronData (L)]]
 +
[[:Property:IncludesPaleoData (L) | IncludesPaleoData (L)]] 
 +
|| DatasetName.ChronData+ChronDataNumber
 +
DatasetName.PaleoData+PaleoDataNumber
 +
|| [[MD982181.Khider.2014.ChronData1]]
 +
[[MD982181.Khider.2014.PaleoData1]]
 +
|-
 +
| [[:Category:Model (L) | Model (L)]] || [[:Property:ModeledBy (L) | ModeledBy (L)]] || Paleo(orChron)DataName.Model+ModelNumber || [[MD982181.Khider.2014.PaleoData1.Model1]]
 +
|-
 +
| [[:Category:MeasurementTable (L) | MeasurementTable (L)]] || [[:Property:FoundInMeasurementTable (L) | FoundInMeasurementTable (L)]] || DatasetName.PaleoNumber+'measurement'+tableNumber
 +
OR DatasetName.ChronNumber+'measurement'+tableNumber
 +
|| [[MD982181.Khider.2014.paleo1measurement1]]
 +
|-
 +
| [[:Category:Uncertainty (L) | Uncertainty (L)]] || [[:Property:HasUncertainty (L) |HasUncertainty (L) ]] || PageName.Uncertainty+UncertaintyNumber
 +
e.g., VariableID.Name.Uncertianty+UncertaintyNumber
 +
LocationOfInstrument.PIName.InstrumentType.Uncertainty+UncertaintyNumber
 +
||
 +
[[PYTES973TGM.sst.Uncertainty1]]
 +
[[USC.Stott.ICPAES.Uncertainty1]]
 +
|-
 +
| [[:Category:Variable (L) | Variable (L)]] || [[:Property:IncludesVariable (L) | IncludesVariable (L)]] || VariableID.Name || [[PYTES973TGM.sst]]
 +
|-
 +
| colspan="4"  style="text-align:center; background-color:#F2F2F2;" | [[:Category:Variable (L) | Variable (L)]]
 +
|-
 +
| [[:Category:MeasurementTable (L) | MeasurementTable (L)]]
 +
OR [[:Category:EnsembleTable (L) | EnsembleTable (L)]]
 +
OR [[:Category:SummaryTable (L) | SummaryTable (L)]]
 +
OR [[:Category:DistributionTable (L) | DistributionTable (L)]]
 +
|| [[:Property:FoundInTable (L) | FoundInTable (L)]] || DatasetName.Paleo(orChron)Number+'measurement'+tableNumber
 +
OR DatasetName.Paleo(orChron)Number+modelNumber+ensemble
 +
OR DatasetName.Paleo(orChron)Number+modelNumber+summary
 +
OR DatasetName.Paleo(orChron)Number+modelNumber+distribution
 +
|| [[MD982181.Khider.2014.paleo1measurement1]]
 +
[[MD982181.Khider.2014.paleo1model1ensemble]]
 +
[[MD982181.Khider.2014.paleo1model1summary]]
 +
|-
 +
| [[:Category:Resolution (L) | Resolution (L) ]] || [[:Property:HasResolution (L) | HasResolution (L)]] || VariableID.Name.Resolution || [[PYT6K1XJRVM.mg/ca-g.rub-w.Resolution]]
 +
|-
 +
| [[:Category:Interpretation (L) | Interpretation (L)]] || [[:Property:InterpretedAs (L) | InterpretedAs (L)]] || VariableID.Name.Interpretation+InterpretationNumber || [[PYT6K1XJRVM.mg/ca-g.rub-w.Interpretation1]]
 +
|-
 +
| colspan="4"  style="text-align:center; background-color:#F2F2F2;" | [[:Category:MeasuredVariable (L) | MeasuredVariable (L)]]
 +
|-
 +
| [[:Category:ProxySystem (L) | ProxySystem (L)]]  || [[:Property:HasProxySystem (L) | HasProxySystem (L)]] || ProxySytem.ArchiveType.Sensor.ProxyObservation || [[ProxySystem.MarineSediment.Globigerinoides_ruber.Mg/Ca]]
 +
|-
 +
| [[:Category:Instrument (L) | Instrument (L)]] || [[:Property:MeasuredBy (L) | MeasuredBy (L)]] || LocationofInstrument.PIName.InstrumentType
 +
If more than one instrument of the same type, use 1,2,3,....
 +
|| [[USC.Stott.ICPAES]]
 +
|-
 +
| [[:Category:ProxyArchive (L) | ProxyArchive (L)]]
 +
AND [[:Category:PhysicalSample (L) | PhysicalSample (L)]]
 +
|| [[:Property:MeasuredOn (L) | MeasuredOn (L)]] || ArchiveName || [[MD98-2181]]
 +
|-
 +
| colspan="4"  style="text-align:center; background-color:#F2F2F2;" | [[:Category:InferredVariable (L) | InferredVariable (L)]]
 +
|-
 +
| [[:Category:CalibrationModel (L) | CalibrationModel (L)]] || [[:Property:CalibratedVia (L) | CalibratedVia (L)]] || VariableID.VariableName.Calibration || [[PYTES973TGM.sst.Calibration]]
 +
|}

Latest revision as of 18:40, 9 May 2017

By design, the LinkedEarth wiki is a collaborative platform to edit paleoclimate datasets and contribute knowledge about the field. As such, anyone within the LinkedEarth community can edit datasets and most of the pages on this wiki (with the exception of pages with a copyright sign, see this page for an explanation.). This page is meant as a best practice guide for creating new pages and modifying existing ones. Specifically, we propose guidelines for:

  • Editing existing datasets by third-party contributors
  • Naming pages with a unique identifier
  • Version the datasets following changes to model outputs (e.g., inferring new temperatures from existing raw measurements) and changes to the raw measurements.

We expect this guide to be updated often as new datasets are added and needs arise, so please check for updates regularly.

Datasets

The following section aims to provide guidelines on creating new dataset or editing existing wiki pages, including datasets used in compilations.

Shortcuts to the most often asked questions
Question Link to Answer
What constitutes a dataset? See this page.
What constitutes a data table? See this page.
Updating datasets following a compilation This section
Updating datasets following the creation of a new model output This section
Updating datasets following the creation of new raw measurements This section

New vs legacy datasets

New datasets are datasets that have recently been published and are often contributed by the original contributor of the study or someone closely associated with the creation of the datasets. This definition also includes older datasets that the PI may have placed on other public databases or have not come around to upload anywhere yet. In this instance, the contributors and the LinkedEarth member uploading the dataset may be the same.Therefore, most of the metadata fields can be filled by the person who was involved in the study since he/she might have the information readily available.

Legacy datasets are datasets that are publicly available (i.e., either on another database or published under U.S. funding) and are contributed by a LinkedEarth member not originally involved in the creation of the dataset. For datasets that are not publicly available (i.e., emailed directly to the LinkedEarth member by the original contributors), we recommend informing the contributors of your intent to upload their dataset on the LinkedEarth wiki.

The guidelines suggested below apply to both new and legacy datasets.

Versioning system

One of the properties of a dataset is the dataset version. In LinkedEarth, the dataset version follows the x.y.z notation where:

  • x refers to changes in metadata and data following a publication. Examples of such changes include the creation of a new age model as part of a compilation or comparison or changes in the way a measured variable is calibrated to obtain an inferred variable (i.e. applying a different calibration model).
  • y refers to changes to the data following a publication. Examples include adding data further back in time without changing the model underlying the interpretation.
  • z refers to changes not associated with a publication and includes typos, addition of metadata either lifted from the publication or from the original contributor of the data (e.g., information from a laboratory notebook).

After the initial upload, set the dataset version to '0.0.0'.

Note: The dataset version is different from the compilation version. The versioning system of each compilation is left at the discretion of the group who created the compilation but should be explained on the compilation page.

Uploading a dataset for the first time on the wiki

We strongly recommend first creating a LiPD file rather than entering all the data and metadata from scratch on the wiki. As of April 2017, the most expeditious way to convert your data into the LiPD format is to use our Excel Template (File:LiPDv1.2 template.xlsx) and the Python LiPD Utilities. This guide will assist you in entering the necessary data and metadata information.

Once your dataset is in LiPD format, you can upload it on the wiki. This will automatically create most of the pages. Check that all the information is correct and once satisfied, update the dataset version to '0.0.0'.

If you decide to enter a dataset manually (not recommended):

  1. Upload your data in csv format using the 'Upload File' link in the sidebar. Make sure you name them appropriately by referring to the nomenclature section on this page. The wiki will suggest names for you to use.
  2. Create a new page using the name SiteName.DatasetYear.ContributorName and set the Category of the new page to Category:Dataset (L). Note: To be able to create a page, you need to enter some text in the WikiText box. You'll be able to delete this extra text from the page after you create it by clicking on edit at the top of the page.
  3. The wiki will automatically suggest standard properties. Answer as many as possible. Note: If the answer to a Property results in the creation of a new class (i.e., the box doesn't specify text or number), then you'll be essentially creating a new wiki page. Follow our nomenclature. If you make a typo, just fixing the typo in the link will not automatically redirect the page. The best approach is to rename the landing page.

Changes to a dataset already on the wiki

For existing datasets, we recommend updating the data and metadata directly on the wiki rather than uploading a new LiPD file.

All changes to a dataset after the initial upload requires a change in the version of the file as outlined here. If you are planning to make a series of updates over the course of several days as part of the same work, only update the dataset version once you're trough with all the changes.

Changing existing data

Downloading a csv file from the wiki
Uploading a new version of a file on the wiki

Only the original contributor to the data and the person uploading the dataset can override the original csv file.

If the change requires creating another column or changing the underlying calibration, you should follow the instruction on adding data tables.

To update data:

  1. go to this page and search for the name of the csv file you need to update.
  2. Download the contributed csv file onto your computer by right-clicking on the name
  3. Make the necessary corrections to the file and save it, using the same file name
  4. To re-upload to the wiki, go back to file page from which you originally downloaded the file.
  5. Click on Upload a new version of this file at the bottom of the page.

Changing existing metadata

The LinkedEarth wiki is mean as a collaborative platform for the curation of paleoclimate data. As such, anyone with basic editor privilege (i.e. a LinkedEarth member) can edit wiki pages.

If you're concerned about changes to your own dataset, please remember that you will receive a notification email when the pages are updated by another member of the community. If you do not agree with the changes being made to your dataset, we suggest that:

  1. Contact the LinkedEarth member, who has made the change using the discussion page for the wiki page.
  2. If you do not receive an answer within 7 business days, try contacting the user by email.
  3. If you cannot resolve the issue within 30 business days, contact us.

Remember that these changes could be as simple as typos and maybe done automatically by the LinkedEarth team to bring your dataset up-to-date with the current ontology. See the Proxy Archive Ontology, the Proxy Observation Ontology, the Proxy Sensor Ontology, the Inferred Variable Ontology, and the Instrument Ontology for details.

Adding metadata only

You can add metadata easily on the wiki. The addition of metadata does not necessarily have to follow a publication. For instance, one LinkedEarth member can upload a legacy dataset in May 2017. In October 2017, another member, perhaps more familiar with the study, may add further information. As previously mentioned, the member, who originally updated the dataset, will receive an email that these pages have been changed. We anticipate such changes when not all the information, especially pertaining to instrumentation. uncertainty, or the physical sample, were available in the original manuscript.

Another example can involve the same contributor uploading the minimal required data and metadata in the haste to meet the journal requirements and deadlines and decide months later to add the recommended and desired metadata.

Adding data

Note: Adding data will automatically create new metadata.

If the updates involve extensive changes to measured variables, inferred variables, models (e.g, age model, calibration) following a new publication, we recommend creating a new LiPD file, and consequently a new dataset page. The new dataset should contain information relating to the first publication and a note explaining the relationships between the two datasets.

Adding Ensemble, distribution and summary tables

Uploading a .csv file on the wiki
Adding an ensemble table page to the wiki
Location of the 'Provide Data Name' button on the wiki page.
Enter the name of the Excel file
Completed EnsenbleDataTable page

If a Bayesian computation was used to derive the inferred variables (an increasingly popular method for age modeling thanks to packages such as Bchron and Bacon]), there might be as many as three tables that can be generated:

  • Ensemble tables are used to store members of the ensemble. We recommend not storing more than 1000 individual members on the wiki.
  • Summary tables allow to store statistics of the ensemble such as the median, quartiles, and quantiles.
  • Distribution tables allows to store the output distribution for a specific horizon. For instance, the calendar age distribution at a specific horizon where radiocarbon was measured.

To add one of these tables to an existing datasets:

  1. Save the values in .csv format with depth in the first column. Name the file according to the nomenclature below. Upload your file by clicking on 'Upload File' in the left sidebar. The wiki should automatically re-direct you to the page dedicated to this particular file on the wiki. Keep this page open, you'll need it for step 6.
  2. Look for the variable the ensemble applies to. In the example in the figures, the ensemble table contains realizations of Sea Surface Temperature series that should be linked to this variable.
  3. Create a new entry for property FoundInTable (L) by clicking on the previous and enter the name of the ensemble table following the nomenclature.
  4. Navigate to the page
    1. Create the model page using the GeneratedByModel (L) and following the proper nomenclature.
    2. Create a page for each variable using the proper nomenclature. For ensemble tables, there should only be two variables: depth stored in the first column and the variable corresponding to each ensemble numbers in columns 2-N, where N represents the number of realizations. See this page for an example.
  5. To link the .csv file to the metadata, click on 'Provide Data Name' at the top of the page.
  6. Enter the name of the csv file, as shown at the top of the file page after upload (ignoring the word 'file'), and click go.

Compilation

The PartofCompilation Property can be used to link a particular dataset to a compilation, in which the dataset has been used (e.g., the PAGES2k compilation). To identify the particular variable used in the compilation, you can add a property to this particular variable to signal it's been used in the compilation (e.g., the PAGES2k consortium used the property UseInPAGES2kGlobalTemperatureAnalysis to identify the specific variable).

Compilation Page

The compilation page has standards properties:

Compilation Products

The products of a compilation (for instance, the benthic d18O stack for LR04 benthic stack or Prob-stack) can be stored directly on the wiki by uploading a text or excel file or externally. To link to the file on the wiki or an external database, use the HasLink (L) property.

Alternatively, the results can be stored in ensemble tables or summary tables and linked accoringly.

Nomenclature

Pages on the wiki sometimes require specific name as to ensure that a unique URL while others can be common to several datasets. For instance, when referring to the measurements stored in one of the column of the csv file, the page name needs to be unique to that dataset. On the other hand, the type of measurements (for instance, δ18O), the page needs to be common to all datasets using the concept of δ18O measurements.

For this reason, we implemented a guide to name wiki pages if you want to contribute a dataset directly on the wiki without going through a LiPD file (not recommended) or want to add information to an existing dataset (for instance, adding information about uncertainty, the physical sample, or the interpretation).

If you need help editing a dataset or unsure about the propert nomenclature, contact us.

Remember that (L) refers to a "core" or LiPD category/property in the LinkedEarth Ontology. As such, these categories and properties cannot be changed by basic editors on the wiki.

Guide to name wiki pages
Category Property Linking to Category Suggested Name Example
Dataset (L) N/A SiteName.Year.FirstAuthor MD982176.Stott.2004
Location (L) CollectedFrom (L) DatasetName.Location MD982176.Stott.2004.Location
Person (L) Contributor (L), Author (L) FirstInitial. MiddleInitial. LastName L. Stott
Publication (L) PublishedIn (L) DOI available: Publication.doi

DOI unavailable: Publication.Dataset

Publication.10.0138/nature02903
Funding (L) FundedBy (L) Agency.GrantNumber National_Science_Foundation.AGS#1049238
ChronData (L)

OR PaleoData (L)

IncludesChronData (L)

IncludesPaleoData (L)

DatasetName.ChronData+ChronDataNumber

DatasetName.PaleoData+PaleoDataNumber

MD982181.Khider.2014.ChronData1

MD982181.Khider.2014.PaleoData1

Model (L) ModeledBy (L) Paleo(orChron)DataName.Model+ModelNumber MD982181.Khider.2014.PaleoData1.Model1
MeasurementTable (L) FoundInMeasurementTable (L) DatasetName.PaleoNumber+'measurement'+tableNumber

OR DatasetName.ChronNumber+'measurement'+tableNumber

MD982181.Khider.2014.paleo1measurement1
Uncertainty (L) HasUncertainty (L) PageName.Uncertainty+UncertaintyNumber

e.g., VariableID.Name.Uncertianty+UncertaintyNumber LocationOfInstrument.PIName.InstrumentType.Uncertainty+UncertaintyNumber

PYTES973TGM.sst.Uncertainty1 USC.Stott.ICPAES.Uncertainty1

Variable (L) IncludesVariable (L) VariableID.Name PYTES973TGM.sst
Variable (L)
MeasurementTable (L)

OR EnsembleTable (L) OR SummaryTable (L) OR DistributionTable (L)

FoundInTable (L) DatasetName.Paleo(orChron)Number+'measurement'+tableNumber

OR DatasetName.Paleo(orChron)Number+modelNumber+ensemble OR DatasetName.Paleo(orChron)Number+modelNumber+summary OR DatasetName.Paleo(orChron)Number+modelNumber+distribution

MD982181.Khider.2014.paleo1measurement1

MD982181.Khider.2014.paleo1model1ensemble MD982181.Khider.2014.paleo1model1summary

Resolution (L) HasResolution (L) VariableID.Name.Resolution PYT6K1XJRVM.mg/ca-g.rub-w.Resolution
Interpretation (L) InterpretedAs (L) VariableID.Name.Interpretation+InterpretationNumber PYT6K1XJRVM.mg/ca-g.rub-w.Interpretation1
MeasuredVariable (L)
ProxySystem (L) HasProxySystem (L) ProxySytem.ArchiveType.Sensor.ProxyObservation ProxySystem.MarineSediment.Globigerinoides_ruber.Mg/Ca
Instrument (L) MeasuredBy (L) LocationofInstrument.PIName.InstrumentType

If more than one instrument of the same type, use 1,2,3,....

USC.Stott.ICPAES
ProxyArchive (L)

AND PhysicalSample (L)

MeasuredOn (L) ArchiveName MD98-2181
InferredVariable (L)
CalibrationModel (L) CalibratedVia (L) VariableID.VariableName.Calibration PYTES973TGM.sst.Calibration