Category:Historical Documents Working Group

From Linked Earth Wiki
Revision as of 02:43, 26 April 2019 by KMicha (Talk | contribs)

Jump to: navigation, search
( Pages with a poll, Working Group )


Overview

In the Linked Earth context, a working group (WG) is a self-organized coalition of knowledgeable experts, whose activities are governed herewith. This page is dedicated to the discussion of data and metadata standards for historical documents, and aims to formulate a set of recommendations for such a standard.

Members of 'Historical Documents Working Group'

    This working group contains only the following member.


Sources

Data is usually compiled from different historical sources. The LiPD data structure supports several Publication, that is normally used for referring to the publication describing the data. So this data cluster can be used for historical sources as well, in addition to the current publication describing the data. It is related to known standards as Dublin Core or BibTEX

  • Source-ID for later reference
  • Source Type (string), i.e. newspaper, book, ...
  • Source Author
  • Source Title
  • Publication Year
  • Publication Date
  • Journal Title
  • Publisher
  • Url to Source (i.e. to PDF)
  • DOI of Source (almost never exist for historical documents if not published as data compilation)


  • Source
Should the 'Source ID' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:57, 26 April 2019.
poll-id B79ACAACB8CF7E76F57C95CD94752C33

Should the 'Source Type' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:57, 26 April 2019.
poll-id 4B9828E446262CD2D25ECC5954AFC6B7

Should the 'Author of Source' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:57, 26 April 2019.
poll-id 1B1140A471D5BB004C42E2EAFF87FF82

Should the 'Title of Source' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:57, 26 April 2019.
poll-id 6DA9EFA61E43E3E9AC0548BFCEBE06B8

Should the 'Publication Year' of the (original-) source be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:57, 26 April 2019.
poll-id 0AE3099A39B686B6E2FA714D32A1A349

Should the 'Publication Date' of the (original-) source be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:57, 26 April 2019.
poll-id 00CFBB515E6BDF4EE42BF9828B76D2A9

Should the 'Journal' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:57, 26 April 2019.
poll-id 167D16FB083F91F3C90B643123FE4165

Should the 'Publisher' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:57, 26 April 2019.
poll-id 263B14776BDC462C240EB6B9903B72E5

Should the 'URL to Source' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:57, 26 April 2019.
poll-id 39F0D42A6060034B89801A81175A88E0

Should the 'DOI of Source' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:57, 26 April 2019.
poll-id 07434CE6F44FD5F86CEC1CFF21C6A4E1

Scans, Pages

Each source can optionally have a bunch of images, that are the scans of the pages inside the source. Maybe this can be dropped for the LiPD format and only references to external images should be added to the quotes.

Quotes

Out of each sources, several quotes could be extracted by transcribing them. Related data would most probably go into "measurementTables".

  • Quote-ID: For later reference
  • Reference to source: maybe short form like "Author()Year" is adequate
  • Page: String of the page(s) number where the quote is extracted from
  • Scan: Optional link(s) to externally stored image of this page(s)
  • Language: The language of the quote (or it's translation)
  • Protolanguage: The language the quote was written originally
  • Quote: The quote itself (UTF8)
  • License: License of the quote (cc)
  • DOI: DOI, the quote is published


  • Quote
Should the 'Quote-ID' (unique for dataset) be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:58, 26 April 2019.
poll-id 6033CCA9D66D35A5EEFF9872F3A41308

Should the 'Reference to Source' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:58, 26 April 2019.
poll-id E06E94AB6F2809E712DA4FBE6E27F803

Should the Format of 'Reference to Source' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:23, 25 April 2019.
poll-id F1C4F26B5E53984D6C0994021A388A8B

Should the 'Page of Quote' inside Source be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:57, 26 April 2019.
poll-id D0013665A75E30D59E9E26E05AA1E345

Should the 'URL to Scan' of Page containing the Quote be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:58, 26 April 2019.
poll-id 9DBAD650333F97D3E145E7428F93D25A

Should the 'Language' of Quote or its Translation be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:58, 26 April 2019.
poll-id A1511268F490E690633AB5BCDF160B56

Should the 'Protolanguage' of Quote be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:58, 26 April 2019.
poll-id 747E4333DE595E07A8E2B050CC4136B3

Should the 'Quote itself' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:58, 26 April 2019.
poll-id 6C19394215C985AADC2879E46A03713C

Should the 'License' of published Quote be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:58, 26 April 2019.
poll-id 2090D59A1031DE61B003019E590667EB

Should the 'DOI' of published Quote be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:58, 26 April 2019.
poll-id 1EA2DE3BF1F3CE01B45E55CF0BFACFE8

Events

Events, that are more like interpretations of the quote would go into "model part". Each events refers to

  • a quote
  • a position
  • a time

and contains a climate related data. (to be defined)

Position

The position different than for the other archive types is not fixed. Usually a compilation of several sources refer to different scattered locations. The location usually can be named but might covers different scales (continent, country, area, city, street,...) and terrain types (city, river, sea, ...). It best corresponds to the Geospatial metadata of LiPD; see also http://schema.org/Place .


  • Position
Should the 'Identifier of Location' (unique for dataset) be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:59, 26 April 2019.
poll-id 11002E6509CF5415CC40F739878EEE66

Should the 'Name of Location' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:59, 26 April 2019.
poll-id 8800AF586B10F0B9D40F2238D39C4A8B

Should the 'Type of Location' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:59, 26 April 2019.
poll-id 86BF2A1725BB8AAACDCAFC164A71E8B7

Should the 'Latitude of Location' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:59, 26 April 2019.
poll-id C0F983783CA34024B49FE6DDE1C8EC6A

Should the 'Longitude of Location' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:59, 26 April 2019.
poll-id 45F59BF6EB37C3E76C639AEAD8365C83

Should the 'Altitude/Elevation of Location' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:59, 26 April 2019.
poll-id 5767C9A8EFD597296845A436C4225EFE

Should the '(JSON)-Geometry of Location' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:59, 26 April 2019.
poll-id 6C47D798ABA029EF52C68A69D4A5EF45

Should the 'Reference to Geonames' by ID be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:59, 26 April 2019.
poll-id B6D7616215DA302C678140D96A837647

Should the extracted corresponding quote piece = 'Phrase' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:59, 26 April 2019.
poll-id A77FD3F023BA869C63F1178079DD95BF


Time

The time derived for an event from historical documents is usually more precise than for other archive types. Often the uncertainty is just days or even hours. So it would be best to code the absolute date by gregorian calendar and a string of type ISO 8601. Optionally an absolute gregorian year (may be with floating point fraction instead of integer) can be calculated as well.

  • Time
How should the event time primarily be defined:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:48, 25 April 2019.
poll-id F5072947F0CB8D3862347216F0EFEB01

The original sources often contain the information in calendar notation different than gregorian. Julian calendar was used in Europe in historical time, but outside Europe other calendars existed or still exist, likeweise in chinese or arabic documents.

Should the time optionally be coded in 'Other Calendar Systems':
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 06:37, 25 April 2019.
poll-id A808B3FBF7319B5B317A05F072FD6DA9

If so, which parameters for start and end should be considered?

Cycle:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:59, 26 April 2019.
poll-id B869D0C865F52CEBEF107E2AED019D81

Year:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:59, 26 April 2019.
poll-id 3E1F87E9F3494940E97D8B1294BEF4E8

Season / Solar Term:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:59, 26 April 2019.
poll-id 47E0991A90408874086B60ED5FAF7FA9

Month:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:59, 26 April 2019.
poll-id 94F834249B96DD6DA43CDE4F865D341A

Day:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:59, 26 April 2019.
poll-id 0F0DE3F95B946E93863929FC85342699

Hour:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:59, 26 April 2019.
poll-id 6692B2DB39CD221EC982E21A77E1F39F

It is also possible to extract the time related text from the quote and add it to a field 'phrase':

Phrase:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:59, 26 April 2019.
poll-id 8EF417EF10EC62EB4F94AAB37B9E995B


Event itself

The event refers to the quote, maybe directly to the source, to the chronology/timing and the position/location. It also holds the coding of the phenomena, here it gets complicated - see next section.

Should the Event refer to 'Location' of described Phenomena:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:59, 26 April 2019.
poll-id DC4AB195ED861533D664B11912F2723A

Should the Event refer to 'Timing' of described Phenomena:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:59, 26 April 2019.
poll-id 80D6C25C264FA5D9E16D98BF849034CB

Should the Event refer to 'Quote' (by ID) where Phenomena is described:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:59, 26 April 2019.
poll-id 21033A0DE082A0DDF73BB899F2C3ED53

Should the Event refer directly to the 'Source' where Quote describing Phenomena can be found:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:59, 26 April 2019.
poll-id 97326EFD19D249BD9F2E41C6418FEEBA

Should the Event contain a 'DOI' when data-set is published somewhere:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:59, 26 April 2019.
poll-id 8D2AB12902A72243C71A5013CF6DE4AB

Should the Event contain a 'License' when data-set is published somewhere:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 00:59, 26 April 2019.
poll-id BC42777F135E4EBF78619BAED4CAE82A

Phenomena/Coding

A lot of information can be found inside historical documents. Thereby the coding schema is complex. Some examples and their coding to illustrate this:

  • The weather was extremely warm -> temperature index + very hot (+3)
  • A lot of rain fell in December -> precipitation type + rain & precipitation amount + more than usual (+1)
  • The water level was 13 feet -> water level + value:13.0 + tolerance:1.0 + unit:feet
  • The drought lead to bad harvest of potatoes -> harvest + harvest amount: less than usual (-1) & potatoes / precipitation amount: less than usual (-1)
  • The wheat harvest was little but the wine quality was good -> harvest + harvest amount: less than usual (-1) & wheat / harvest + harvest quality: better than usual (+1) & wine

+: one dimensional combination

&: two (or more) dimensional combinations

/: two different events (i.e. cause vs effect)

(Coding tree as used in tambora.org can be found here.)

To handle the innumerable amount of different parameters, it makes sense to group them into thematic clusters.

Cluster: Temperature

Inside this cluster are all temperature related parameters: Temperature measurements, descriptive temperature levels, freezing events, ...


  • temperature-cluster

Levels i.e. can be

-3: very cold
-2: cold
-1: cool
 0: normal temperature
+1: warm
+2: hot
+3: very hot
Should the 'temperature level' (indices) be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 01:44, 26 April 2019.
poll-id 65502D023277498206B57688BD530F7D

How many levels should the temperature have:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 01:45, 26 April 2019.
poll-id ADAB7A463A205E946F91EB439D72F97E

Trend i.e. can be

-1: decreasing
 0: constant
+1: increasing
Should the 'temperature trend' (indices) be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 01:47, 26 April 2019.
poll-id CA75A295662E1925453DBBDAAC9EDC84

Temperature value is a measured, numerical value with units Standards of other archive types can be used here

Should the 'temperature value' (measurement) be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 01:54, 26 April 2019.
poll-id 31651817B8C96932105AE8C5BE0618DF

Should the 'temperature unit' (measurement) be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 01:54, 26 April 2019.
poll-id 786A8617A75B1B702F30D4A255C9FDA5

Should the 'tolerance of the temperature' measurement be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 01:54, 26 April 2019.
poll-id 65E1F0E78137CF657366EBABACF025A9

The following two parameters are of type boolean - they happen or not.

Should the occurrence of 'freezing/frost/ice' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 01:57, 26 April 2019.
poll-id B017D65C03AF92FC0C8108DA1CC16D83

Should the occurrence of 'thawing/melting' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 01:57, 26 April 2019.
poll-id EB4460E86661E290B32F5369328BBA3A

Cluster: Precipitation

Inside this cluster are all precipitation related parameters: long- and short-term amounts, intensities, measurements, type of precipitation, ...


  • precipitation

Levels of long-term precipitation (weeks... month) i.e. can be

-3: extremely dry
-2: very dry
-1: dry
 0: normal precipitation
+1: wet
+2: very wet
+3: extremely wet
Should the 'long-term precipitation level' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 02:12, 26 April 2019.
poll-id 0BC408B57C7E07876CC237732A6A4EBC

How many levels should the long-term precipitation level have:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 02:12, 26 April 2019.
poll-id F2B63053433745D49C1D193FD0173316

Levels of short-term precipitation (hours ... days) i.e. can be

 0: no precipitation
+1: some precipitation
+2: much precipitation
+3: very much precipitation
+4: extremely much precipitation
Should the 'short-term precipitation level' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 02:14, 26 April 2019.
poll-id 99E585C12A0AB90AA32F79D55534CE91

How many levels should the short-term precipitation level have:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 02:14, 26 April 2019.
poll-id FF918795D7594433F780F6B9AA3A8C4B

The amount of precipitation is a measured value for a given time frame (see timing)

Should the 'precipitation amount' (measurement) be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 02:19, 26 April 2019.
poll-id 394BF42D1C446F0CE5AE17E9DA822BE4

Should the 'precipitation amount unit' (measurement) be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 02:23, 26 April 2019.
poll-id D399079FDB133154DE16FE3268B9173D

The intensity of precipitation is a measured value, often a peal value

Should the 'precipitation intensity' (measurement) be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 02:21, 26 April 2019.
poll-id 1A4E114D41069E1DB5B6B24E6C373F4F

Should the 'precipitation intensity unit' (measurement) be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 02:25, 26 April 2019.
poll-id CC7687F582CE969364E219E8F639BEC2

The precipitation type is a string i.e. rain, snow, hail, dew, sleet, ... It can be combined with the above quantitative parameters, but can also stand alone

Should the 'precipitation type' (string, options, choice) be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 02:28, 26 April 2019.
poll-id BCDD1F5E126CEB741BDAEA9139F1E21B

Levels of snow cover i.e. can be

 0: no snow cover
+1: thin snow cover
+2: deep snow cover
Should the 'snow cover level' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 02:30, 26 April 2019.
poll-id 5FCBFA186D3FD3C8C5BA3B9088B89FBA

How many levels should the snow cover level have:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 02:31, 26 April 2019.
poll-id FF8C631F280539F7FA1F0E37C08A7118

The snow depth is static (accumulated), whereas the snow fall is per time-frame

Should the 'snow depth' (measurement) be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 02:31, 26 April 2019.
poll-id 145B73189382B854B476613A0287E1B4

Should the 'snow depth unit' (measurement) be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 02:32, 26 April 2019.
poll-id 41D0FD6993509D37D94932555ABD915D

Should the 'snow fall' (measurement) be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 02:33, 26 April 2019.
poll-id 89B0630F23AD5BD990C2E1EAF2683FCC

Should the 'snow fall unit' (measurement) be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 02:33, 26 April 2019.
poll-id 76DA2D73AF7B190F5534925EFD93938C


Cluster: Hydrology

Cluster: Clouds, Visibility

Cluster: Wind, Air Pressure

Cluster: Society

Cluster: Plant Phenology

Cluster: Animal Phenology

Cluster: Economy

(codeset_id,codeset_description,category,path,node_label,scale_label,scale_unit,value_label,value_index,average,variance,si_unit,si_average,si_variance )

https://sweet.jpl.nasa.gov/

Example

Rename LiPD file extension from zip to lpd

Description Tambora-Files LiPD-Files Remarks
Flood Example Media:flood_tambora_csv.zip Media:Exp0000.tambora.2017.zip
To-Do To-Do To-Do
To-Do To-Do To-Do

Media in category "Historical Documents Working Group"

The following 6 files are in this category, out of 6 total.