In the Linked Earth context, a working group (WG) is a self-organized coalition of knowledgeable experts, whose activities are governed herewith. This page is dedicated to the discussion of data and metadata standards for historical documents, and aims to formulate a set of recommendations for such a standard.
Members of 'Historical Documents Working Group'
This working group contains only the following member.
Data is usually compiled from different historical sources.
The LiPD data structure supports several Publication, that is normally used for referring to the publication
describing the data. So this data cluster can be used for historical sources as well, in addition to the current publication describing
the data. It is related to known standards as Dublin Core or BibTEX
Source-ID for later reference
Source Type (string), i.e. newspaper, book, ...
Source Author
Source Title
Publication Year
Publication Date
Journal Title
Publisher
Url to Source (i.e. to PDF)
DOI of Source (almost never exist for historical documents if not published as data compilation)
Source
Should the 'Source ID' be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:57, 26 April 2019.
poll-id B79ACAACB8CF7E76F57C95CD94752C33
Should the 'Source Type' be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:57, 26 April 2019.
poll-id 4B9828E446262CD2D25ECC5954AFC6B7
Should the 'Author of Source' be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:57, 26 April 2019.
poll-id 1B1140A471D5BB004C42E2EAFF87FF82
Should the 'Title of Source' be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:57, 26 April 2019.
poll-id 6DA9EFA61E43E3E9AC0548BFCEBE06B8
Should the 'Publication Year' of the (original-) source be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:57, 26 April 2019.
poll-id 0AE3099A39B686B6E2FA714D32A1A349
Should the 'Publication Date' of the (original-) source be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:57, 26 April 2019.
poll-id 00CFBB515E6BDF4EE42BF9828B76D2A9
Should the 'Journal' be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:57, 26 April 2019.
poll-id 167D16FB083F91F3C90B643123FE4165
Should the 'Publisher' be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:57, 26 April 2019.
poll-id 263B14776BDC462C240EB6B9903B72E5
Should the 'URL to Source' be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:57, 26 April 2019.
poll-id 39F0D42A6060034B89801A81175A88E0
Should the 'DOI of Source' be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:57, 26 April 2019.
poll-id 07434CE6F44FD5F86CEC1CFF21C6A4E1
Scans, Pages
Each source can optionally have a bunch of images, that are the scans of the pages inside the source.
Maybe this can be dropped for the LiPD format and only references to external images should be added
to the quotes.
Quotes
Out of each sources, several quotes could be extracted by transcribing them.
Related data would most probably go into "measurementTables".
Quote-ID: For later reference
Reference to source: maybe short form like "Author()Year" is adequate
Page: String of the page(s) number where the quote is extracted from
Scan: Optional link(s) to externally stored image of this page(s)
Language: The language of the quote (or it's translation)
Protolanguage: The language the quote was written originally
Quote: The quote itself (UTF8)
License: License of the quote (cc)
DOI: DOI, the quote is published
Quote
Should the 'Quote-ID' (unique for dataset) be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:58, 26 April 2019.
poll-id 6033CCA9D66D35A5EEFF9872F3A41308
Should the 'Reference to Source' be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:58, 26 April 2019.
poll-id E06E94AB6F2809E712DA4FBE6E27F803
Should the Format of 'Reference to Source' be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 14:23, 25 April 2019.
poll-id F1C4F26B5E53984D6C0994021A388A8B
Should the 'Page of Quote' inside Source be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:57, 26 April 2019.
poll-id D0013665A75E30D59E9E26E05AA1E345
Should the 'URL to Scan' of Page containing the Quote be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:58, 26 April 2019.
poll-id 9DBAD650333F97D3E145E7428F93D25A
Should the 'Language' of Quote or its Translation be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:58, 26 April 2019.
poll-id A1511268F490E690633AB5BCDF160B56
Should the 'Protolanguage' of Quote be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:58, 26 April 2019.
poll-id 747E4333DE595E07A8E2B050CC4136B3
Should the 'Quote itself' be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:58, 26 April 2019.
poll-id 6C19394215C985AADC2879E46A03713C
Should the 'License' of published Quote be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:58, 26 April 2019.
poll-id 2090D59A1031DE61B003019E590667EB
Should the 'DOI' of published Quote be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:58, 26 April 2019.
poll-id 1EA2DE3BF1F3CE01B45E55CF0BFACFE8
Events
Events, that are more like interpretations of the quote would go
into "model part". Each events refers to
a quote
a position
a time
and contains a climate related data. (to be defined)
Position
The position different than for the other archive types is not fixed. Usually a compilation of several sources refer to different scattered locations.
The location usually can be named but might covers different scales (continent, country, area, city, street,...) and terrain types (city, river, sea, ...).
It best corresponds to the Geospatial metadata of LiPD; see also http://schema.org/Place .
Position
Should the 'Identifier of Location' (unique for dataset) be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 11002E6509CF5415CC40F739878EEE66
Should the 'Name of Location' be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 8800AF586B10F0B9D40F2238D39C4A8B
Should the 'Type of Location' be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 86BF2A1725BB8AAACDCAFC164A71E8B7
Should the 'Latitude of Location' be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id C0F983783CA34024B49FE6DDE1C8EC6A
Should the 'Longitude of Location' be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 45F59BF6EB37C3E76C639AEAD8365C83
Should the 'Altitude/Elevation of Location' be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 5767C9A8EFD597296845A436C4225EFE
Should the '(JSON)-Geometry of Location' be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 6C47D798ABA029EF52C68A69D4A5EF45
Should the 'Reference to Geonames' by ID be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id B6D7616215DA302C678140D96A837647
Should the extracted corresponding quote piece = 'Phrase' be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id A77FD3F023BA869C63F1178079DD95BF
Time
The time derived for an event from historical documents is usually more precise than for other archive types.
Often the uncertainty is just days or even hours. So it would be best to code the absolute date by gregorian
calendar and a string of type ISO 8601. Optionally an absolute gregorian year (may be with floating point fraction
instead of integer) can be calculated as well.
Time
How should the event time primarily be defined:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:48, 25 April 2019.
poll-id F5072947F0CB8D3862347216F0EFEB01
The original sources often contain the information in calendar notation different than gregorian.
Julian calendar was used in Europe in historical time, but outside Europe other calendars existed
or still exist, likeweise in chinese or arabic documents.
Should the time optionally be coded in 'Other Calendar Systems':
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 13:37, 25 April 2019.
poll-id A808B3FBF7319B5B317A05F072FD6DA9
If so, which parameters for start and end should be considered?
Cycle:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id B869D0C865F52CEBEF107E2AED019D81
Year:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 3E1F87E9F3494940E97D8B1294BEF4E8
Season / Solar Term:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 47E0991A90408874086B60ED5FAF7FA9
Month:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 94F834249B96DD6DA43CDE4F865D341A
Day:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 0F0DE3F95B946E93863929FC85342699
Hour:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 6692B2DB39CD221EC982E21A77E1F39F
It is also possible to extract the time related text from the quote and add it to a field 'phrase':
Phrase:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 8EF417EF10EC62EB4F94AAB37B9E995B
Event itself
The event refers to the quote, maybe directly to the source, to the chronology/timing and the position/location.
It also holds the coding of the phenomena, here it gets complicated - see next section.
Should the Event refer to 'Location' of described Phenomena:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id DC4AB195ED861533D664B11912F2723A
Should the Event refer to 'Timing' of described Phenomena:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 80D6C25C264FA5D9E16D98BF849034CB
Should the Event refer to 'Quote' (by ID) where Phenomena is described:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 21033A0DE082A0DDF73BB899F2C3ED53
Should the Event refer directly to the 'Source' where Quote describing Phenomena can be found:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 97326EFD19D249BD9F2E41C6418FEEBA
Should the Event contain a 'DOI' when data-set is published somewhere:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 8D2AB12902A72243C71A5013CF6DE4AB
Should the Event contain a 'License' when data-set is published somewhere:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id BC42777F135E4EBF78619BAED4CAE82A
Phenomena/Coding
A lot of information can be found inside historical documents. Thereby the coding schema is complex.
Some examples and their coding to illustrate this:
The weather was extremely warm -> temperature index + very hot (+3)
A lot of rain fell in December -> precipitation type + rain & precipitation amount + more than usual (+1)
The water level was 13 feet -> water level + value:13.0 + tolerance:1.0 + unit:feet
The drought lead to bad harvest of potatoes -> harvest + harvest amount: less than usual (-1) & potatoes / precipitation amount: less than usual (-1)
The wheat harvest was little but the wine quality was good -> harvest + harvest amount: less than usual (-1) & wheat / harvest + harvest quality: better than usual (+1) & wine
+: one dimensional combination
&: two (or more) dimensional combinations
/: two different events (i.e. cause vs effect)
(Coding tree as used in tambora.org can be found here.)
To handle the innumerable amount of different parameters, it makes sense to group them into thematic clusters.
Cluster: Temperature
Inside this cluster are all temperature related parameters: Temperature measurements, descriptive temperature levels, freezing events, ...
temperature-cluster
Levels i.e. can be
-3: very cold
-2: cold
-1: cool
0: normal temperature
+1: warm
+2: hot
+3: very hot
Should the 'temperature level' (indices) be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 08:44, 26 April 2019.
poll-id 65502D023277498206B57688BD530F7D
How many levels should the temperature have:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 08:45, 26 April 2019.
poll-id ADAB7A463A205E946F91EB439D72F97E
Trend i.e. can be
-1: decreasing
0: constant
+1: increasing
Should the 'temperature trend' (indices) be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 08:47, 26 April 2019.
poll-id CA75A295662E1925453DBBDAAC9EDC84
Temperature value is a measured, numerical value with units
Should the 'temperature value' (measurement) be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 08:54, 26 April 2019.
poll-id 31651817B8C96932105AE8C5BE0618DF
Should the 'temperature unit' (measurement) be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 08:54, 26 April 2019.
poll-id 786A8617A75B1B702F30D4A255C9FDA5
Should the 'tolerance of the temperature' measurement be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 08:54, 26 April 2019.
poll-id 65E1F0E78137CF657366EBABACF025A9
The following two parameters are of type boolean - they happen or not.
Should the occurrence of 'freezing/frost/ice' be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 08:57, 26 April 2019.
poll-id B017D65C03AF92FC0C8108DA1CC16D83
Should the occurrence of 'thawing/melting' be:
You are not entitled to vote. You are not entitled to view results of this poll.
There was one vote since the poll was created on 08:57, 26 April 2019.