Category:Historical Documents Working Group

From Linked Earth Wiki
Revision as of 08:15, 26 April 2019 by KMicha (Talk | contribs)

Jump to: navigation, search
( Pages with a poll, Working Group )


Overview

In the Linked Earth context, a working group (WG) is a self-organized coalition of knowledgeable experts, whose activities are governed herewith. This page is dedicated to the discussion of data and metadata standards for historical documents, and aims to formulate a set of recommendations for such a standard.

Members of 'Historical Documents Working Group'

    This working group contains only the following member.


Sources

Data is usually compiled from different historical sources. The LiPD data structure supports several Publication, that is normally used for referring to the publication describing the data. So this data cluster can be used for historical sources as well, in addition to the current publication describing the data. It is related to known standards as Dublin Core or BibTEX

  • Source-ID for later reference
  • Source Type (string), i.e. newspaper, book, ...
  • Source Author
  • Source Title
  • Publication Year
  • Publication Date
  • Journal Title
  • Publisher
  • Url to Source (i.e. to PDF)
  • DOI of Source (almost never exist for historical documents if not published as data compilation)


  • Source
Should the 'Source ID' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:57, 26 April 2019.
poll-id B79ACAACB8CF7E76F57C95CD94752C33

Should the 'Source Type' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:57, 26 April 2019.
poll-id 4B9828E446262CD2D25ECC5954AFC6B7

Should the 'Author of Source' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:57, 26 April 2019.
poll-id 1B1140A471D5BB004C42E2EAFF87FF82

Should the 'Title of Source' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:57, 26 April 2019.
poll-id 6DA9EFA61E43E3E9AC0548BFCEBE06B8

Should the 'Publication Year' of the (original-) source be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:57, 26 April 2019.
poll-id 0AE3099A39B686B6E2FA714D32A1A349

Should the 'Publication Date' of the (original-) source be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:57, 26 April 2019.
poll-id 00CFBB515E6BDF4EE42BF9828B76D2A9

Should the 'Journal' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:57, 26 April 2019.
poll-id 167D16FB083F91F3C90B643123FE4165

Should the 'Publisher' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:57, 26 April 2019.
poll-id 263B14776BDC462C240EB6B9903B72E5

Should the 'URL to Source' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:57, 26 April 2019.
poll-id 39F0D42A6060034B89801A81175A88E0

Should the 'DOI of Source' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:57, 26 April 2019.
poll-id 07434CE6F44FD5F86CEC1CFF21C6A4E1

Scans, Pages

Each source can optionally have a bunch of images, that are the scans of the pages inside the source. Maybe this can be dropped for the LiPD format and only references to external images should be added to the quotes.

Quotes

Out of each sources, several quotes could be extracted by transcribing them. Related data would most probably go into "measurementTables".

  • Quote-ID: For later reference
  • Reference to source: maybe short form like "Author()Year" is adequate
  • Page: String of the page(s) number where the quote is extracted from
  • Scan: Optional link(s) to externally stored image of this page(s)
  • Language: The language of the quote (or it's translation)
  • Protolanguage: The language the quote was written originally
  • Quote: The quote itself (UTF8)
  • License: License of the quote (cc)
  • DOI: DOI, the quote is published


  • Quote
Should the 'Quote-ID' (unique for dataset) be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:58, 26 April 2019.
poll-id 6033CCA9D66D35A5EEFF9872F3A41308

Should the 'Reference to Source' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:58, 26 April 2019.
poll-id E06E94AB6F2809E712DA4FBE6E27F803

Should the Format of 'Reference to Source' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 14:23, 25 April 2019.
poll-id F1C4F26B5E53984D6C0994021A388A8B

Should the 'Page of Quote' inside Source be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:57, 26 April 2019.
poll-id D0013665A75E30D59E9E26E05AA1E345

Should the 'URL to Scan' of Page containing the Quote be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:58, 26 April 2019.
poll-id 9DBAD650333F97D3E145E7428F93D25A

Should the 'Language' of Quote or its Translation be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:58, 26 April 2019.
poll-id A1511268F490E690633AB5BCDF160B56

Should the 'Protolanguage' of Quote be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:58, 26 April 2019.
poll-id 747E4333DE595E07A8E2B050CC4136B3

Should the 'Quote itself' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:58, 26 April 2019.
poll-id 6C19394215C985AADC2879E46A03713C

Should the 'License' of published Quote be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:58, 26 April 2019.
poll-id 2090D59A1031DE61B003019E590667EB

Should the 'DOI' of published Quote be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:58, 26 April 2019.
poll-id 1EA2DE3BF1F3CE01B45E55CF0BFACFE8

Events

Events, that are more like interpretations of the quote would go into "model part". Each events refers to

  • a quote
  • a position
  • a time

and contains a climate related data. (to be defined)

Position

The position different than for the other archive types is not fixed. Usually a compilation of several sources refer to different scattered locations. The location usually can be named but might covers different scales (continent, country, area, city, street,...) and terrain types (city, river, sea, ...). It best corresponds to the Geospatial metadata of LiPD; see also http://schema.org/Place .


  • Position
Should the 'Identifier of Location' (unique for dataset) be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 11002E6509CF5415CC40F739878EEE66

Should the 'Name of Location' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 8800AF586B10F0B9D40F2238D39C4A8B

Should the 'Type of Location' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 86BF2A1725BB8AAACDCAFC164A71E8B7

Should the 'Latitude of Location' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id C0F983783CA34024B49FE6DDE1C8EC6A

Should the 'Longitude of Location' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 45F59BF6EB37C3E76C639AEAD8365C83

Should the 'Altitude/Elevation of Location' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 5767C9A8EFD597296845A436C4225EFE

Should the '(JSON)-Geometry of Location' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 6C47D798ABA029EF52C68A69D4A5EF45

Should the 'Reference to Geonames' by ID be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id B6D7616215DA302C678140D96A837647

Should the extracted corresponding quote piece = 'Phrase' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id A77FD3F023BA869C63F1178079DD95BF


Time

The time derived for an event from historical documents is usually more precise than for other archive types. Often the uncertainty is just days or even hours. So it would be best to code the absolute date by gregorian calendar and a string of type ISO 8601. Optionally an absolute gregorian year (may be with floating point fraction instead of integer) can be calculated as well.

  • Time
How should the event time primarily be defined:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:48, 25 April 2019.
poll-id F5072947F0CB8D3862347216F0EFEB01

The original sources often contain the information in calendar notation different than gregorian. Julian calendar was used in Europe in historical time, but outside Europe other calendars existed or still exist, likeweise in chinese or arabic documents.

Should the time optionally be coded in 'Other Calendar Systems':
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 13:37, 25 April 2019.
poll-id A808B3FBF7319B5B317A05F072FD6DA9

If so, which parameters for start and end should be considered?

Cycle:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id B869D0C865F52CEBEF107E2AED019D81

Year:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 3E1F87E9F3494940E97D8B1294BEF4E8

Season / Solar Term:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 47E0991A90408874086B60ED5FAF7FA9

Month:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 94F834249B96DD6DA43CDE4F865D341A

Day:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 0F0DE3F95B946E93863929FC85342699

Hour:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 6692B2DB39CD221EC982E21A77E1F39F

It is also possible to extract the time related text from the quote and add it to a field 'phrase':

Phrase:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 8EF417EF10EC62EB4F94AAB37B9E995B


Event itself

The event refers to the quote, maybe directly to the source, to the chronology/timing and the position/location. It also holds the coding of the phenomena, here it gets complicated - see next section.

Should the Event refer to 'Location' of described Phenomena:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id DC4AB195ED861533D664B11912F2723A

Should the Event refer to 'Timing' of described Phenomena:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 80D6C25C264FA5D9E16D98BF849034CB

Should the Event refer to 'Quote' (by ID) where Phenomena is described:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 21033A0DE082A0DDF73BB899F2C3ED53

Should the Event refer directly to the 'Source' where Quote describing Phenomena can be found:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 97326EFD19D249BD9F2E41C6418FEEBA

Should the Event contain a 'DOI' when data-set is published somewhere:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id 8D2AB12902A72243C71A5013CF6DE4AB

Should the Event contain a 'License' when data-set is published somewhere:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 26 April 2019.
poll-id BC42777F135E4EBF78619BAED4CAE82A

Phenomena/Coding

A lot of information can be found inside historical documents. Thereby the coding schema is complex. Some examples and their coding to illustrate this:

  • The weather was extremely warm -> temperature index + very hot (+3)
  • A lot of rain fell in December -> precipitation type + rain & precipitation amount + more than usual (+1)
  • The water level was 13 feet -> water level + value:13.0 + tolerance:1.0 + unit:feet
  • The drought lead to bad harvest of potatoes -> harvest + harvest amount: less than usual (-1) & potatoes / precipitation amount: less than usual (-1)
  • The wheat harvest was little but the wine quality was good -> harvest + harvest amount: less than usual (-1) & wheat / harvest + harvest quality: better than usual (+1) & wine

+: one dimensional combination

&: two (or more) dimensional combinations

/: two different events (i.e. cause vs effect)

(Coding tree as used in tambora.org can be found here.)

(codeset_id,codeset_description,category,path,node_label,scale_label,scale_unit,value_label,value_index,average,variance,si_unit,si_average,si_variance )

https://sweet.jpl.nasa.gov/

Example

Rename LiPD file extension from zip to lpd

Description Tambora-Files LiPD-Files Remarks
Flood Example Media:flood_tambora_csv.zip Media:Exp0000.tambora.2017.zip
To-Do To-Do To-Do
To-Do To-Do To-Do

Media in category "Historical Documents Working Group"

The following 6 files are in this category, out of 6 total.