Category:Historical Documents Working Group

From Linked Earth Wiki
Revision as of 15:23, 25 April 2019 by KMicha (Talk | contribs)

Jump to: navigation, search
( Pages with a poll, Working Group )


Overview

In the Linked Earth context, a working group (WG) is a self-organized coalition of knowledgeable experts, whose activities are governed herewith. This page is dedicated to the discussion of data and metadata standards for historical documents, and aims to formulate a set of recommendations for such a standard.

Members of 'Historical Documents Working Group'

    This working group contains only the following member.


Sources

Data is usually compiled from different historical sources. The LiPD data structure supports several Publication, that is normally used for referring to the publication describing the data. So this data cluster can be used for historical sources as well, in addition to the current publication describing the data. It is related to known standards as Dublin Core or BibTEX

  • Source-ID for later reference
  • Source Type (string), i.e. newspaper, book, ...
  • Source Author
  • Source Title
  • Publication Year
  • Publication Date
  • Journal Title
  • Publisher
  • Url to Source (i.e. to PDF)
  • DOI of Source (almost never exist for historical documents if not published as data compilation)


  • Source
Should the 'Source ID' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 14:37, 25 April 2019.
poll-id 1638562CAC86A52DADBCA87CFFD13453

Should the 'Source Type' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 14:38, 25 April 2019.
poll-id 56A20EFC47F38DBF63696C680CEE557F

Should the 'Author of Source' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 14:38, 25 April 2019.
poll-id 1FFCAB12D6E8139D020552835EF0DB3A

Should the 'Title of Source' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 14:39, 25 April 2019.
poll-id 060ECE3FBACFAB0C6C5A31B2829BB72D

Should the 'Publication Year' of the (original-) source be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 14:39, 25 April 2019.
poll-id 1136E735E3457A6BE480753DBDAC6820

Should the 'Journal' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 14:40, 25 April 2019.
poll-id 362B01EDDE5FF552A59731085059D2F1

Should the 'Publisher' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 14:40, 25 April 2019.
poll-id 50C5F95BC29A41AC6CB9ABEDC7D96999

Should the 'URL to Source' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 14:40, 25 April 2019.
poll-id B7759FE8726017924C0F7D78D53D49AA

Should the 'DOI of Source' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 14:40, 25 April 2019.
poll-id 260CE32E45052DE187282D7B2B3298F0

Scans, Pages

Each source can optionally have a bunch of images, that are the scans of the pages inside the source. Maybe this can be dropped for the LiPD format and only references to external images should be added to the quotes.

Quotes

Out of each sources, several quotes could be extracted by transcribing them. Related data would most probably go into "measurementTables".

  • Quote-ID: For later reference
  • Reference to source: maybe short form like "Author()Year" is adequate
  • Page: String of the page(s) number where the quote is extracted from
  • Scan: Optional link(s) to externally stored image of this page(s)
  • Language: The language of the quote (or it's translation)
  • Protolanguage: The language the quote was written originally
  • Quote: The quote itself (UTF8)
  • License: License of the quote (cc)
  • DOI: DOI, the quote is published


  • Quote
Should the 'Quote-ID' (unique for dataset) be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 14:21, 25 April 2019.
poll-id D4F560C5FE7D6A3C7E274450950F706A

Should the 'Reference to Source' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 14:22, 25 April 2019.
poll-id 73EF460A6F93E34C41E0535D157C8585

Should the Format of 'Reference to Source' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 14:23, 25 April 2019.
poll-id F1C4F26B5E53984D6C0994021A388A8B

Should the 'Page of Quote' inside Source be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 14:24, 25 April 2019.
poll-id 6B2EF948FE30F83633BE534777608D08

Should the 'URL to Scan' of Page containing the Quote be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 14:25, 25 April 2019.
poll-id C6E273E3FE60F3F99C06E039B54000F2

Should the 'Language' of Quote or its Translation be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 14:25, 25 April 2019.
poll-id EA4BDF8BC77DFD9EEC8676504032C0AD

Should the 'Protolanguage' of Quote be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 14:26, 25 April 2019.
poll-id 3D5B800C065DFF573DACEA7F92EA1154

Should the 'Quote itself' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 14:26, 25 April 2019.
poll-id 6A2C35A91F1C1884019FCB7F1B3710A4

Should the 'License' of published Quote be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 14:27, 25 April 2019.
poll-id F32AF0A21412A53D0CEB29FDE57007EF

Should the 'DOI' of published Quote be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 14:28, 25 April 2019.
poll-id 1FED57036FBA34C6E21AA629633ED273

Events

Events, that are more like interpretations of the quote would go into "model part". Each events refers to

  • a quote
  • a position
  • a time

and contains a climate related data. (to be defined)

Position

The position different than for the other archive types is not fixed. Usually a compilation of several sources refer to different scattered locations. The location usually can be named but might covers different scales (continent, country, area, city, street,...) and terrain types (city, river, sea, ...). It best corresponds to the Geospatial metadata of LiPD; see also http://schema.org/Place .


  • Position
Should the 'Identifier of Location' (unique for dataset) be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 13:40, 25 April 2019.
poll-id BFFA18B8C1BDF368EA5341588802991E

Should the 'Name of Location' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 13:35, 25 April 2019.
poll-id D22166FF638CE4D12C622E35A6597CCC

Should the 'Type of Location' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 13:36, 25 April 2019.
poll-id EB4A5AADA9B4EFD35BDB6A18124045B7

Should the 'Latitude of Location' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 13:36, 25 April 2019.
poll-id 599A334D27CB4BE975C869CEEA6DE6DB

Should the 'Longitude of Location' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 13:36, 25 April 2019.
poll-id 5F7945C7D284928206D05C09A087FDAB

Should the 'Altitude/Elevation of Location' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 13:53, 25 April 2019.
poll-id 482E1DADF3DB41A882CA590E58BDBD10

Should the '(JSON)-Geometry of Location' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 13:36, 25 April 2019.
poll-id E7C3C0D00D3823DD5D6499F50D347DCE

Should the 'Reference to Geonames' by ID be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 13:36, 25 April 2019.
poll-id 5C822DFBD7B179D69B3082FC4E672204

Should the extracted corresponding quote piece = 'Phrase' be:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 13:36, 25 April 2019.
poll-id 8D03DC5D06E49A47D24FD5E28A9C31C0


Time

The time derived for an event from historical documents is usually more precise than for other archive types. Often the uncertainty is just days or even hours. So it would be best to code the absolute date by gregorian calendar and a string of type ISO 8601. Optionally an absolute gregorian year (may be with floating point fraction instead of integer) can be calculated as well.

  • Time
How should the event time primarily be defined:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:48, 25 April 2019.
poll-id F5072947F0CB8D3862347216F0EFEB01

The original sources often contain the information in calendar notation different than gregorian. Julian calendar was used in Europe in historical time, but outside Europe other calendars existed or still exist, likeweise in chinese or arabic documents.

Should the time optionally be coded in 'Other Calendar Systems':
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 13:37, 25 April 2019.
poll-id A808B3FBF7319B5B317A05F072FD6DA9

If so, which parameters for start and end should be considered?

Cycle:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:59, 25 April 2019.
poll-id 6E47C6143936C80F76DA4595BBF0D6ED

Year:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:55, 25 April 2019.
poll-id 0EB04C5BBFD4BFC1FC4B11060C545D4B

Season / Solar Term:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 08:00, 25 April 2019.
poll-id 410789FBAD74B0772DE299546EC469DF

Month:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:55, 25 April 2019.
poll-id 03120082E79709CF299C7126AF4C3039

Day:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 07:55, 25 April 2019.
poll-id B6DDAAF86768F314B8E23B0F1C574BBF

Hour:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 08:00, 25 April 2019.
poll-id 4B6DF41D3FB86CC1DD016AA7C316FAD8

It is also possible to extract the time related text from the quote and add it to a field 'phrase':

Phrase:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 08:05, 25 April 2019.
poll-id E522F65760245A919C128E17FB915D93


Event itself

The event refers to the quote, maybe directly to the source, to the chronology/timing and the position/location. It also holds the coding of the phenomena, here it gets complicated - see next section.

Should the Event refer to 'Location' of described Phenomena:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 15:01, 25 April 2019.
poll-id 9335F1052EA322CC72D257A59DB563CF

Should the Event refer to 'Timing' of described Phenomena:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 15:02, 25 April 2019.
poll-id B0BDC3C0570E78750A3A1508312325F5

Should the Event refer to 'Quote' (by ID) where Phenomena is described:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 15:02, 25 April 2019.
poll-id 78493F7FB7225865F57355BFB6FD7050

Should the Event refer directly to the 'Source' where Quote describing Phenomena can be found:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 15:03, 25 April 2019.
poll-id 6ABC01F70A6577285B0E8C6D883DC247

Should the Event contain a 'DOI' when data-set is published somewhere:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 15:04, 25 April 2019.
poll-id 4AE98F865B22CDF1F9CF539923B6C2B6

Should the Event contain a 'License' when data-set is published somewhere:
You are not entitled to vote.
You are not entitled to view results of this poll.
There was one vote since the poll was created on 15:04, 25 April 2019.
poll-id 4A2A4F7749BECF3E49054146F9F2BB3C

Phenomena/Coding

A lot of information can be found inside historical documents. Thereby the coding schema is complex. Some examples and their coding to illustrate this:

  • The weather was extremely warm -> temperature index + very hot (+3)
  • A lot of rain fell in December -> precipitation type + rain & precipitation amount + more than usual (+1)
  • The water level was 13 feet -> water level + value:13.0 + tolerance:1.0 + unit:feet
  • The drought lead to bad harvest of potatoes -> harvest + harvest amount: less than usual (-1) & potatoes / precipitation amount: less than usual (-1)
  • The wheat harvest was little but the wine quality was good -> harvest + harvest amount: less than usual (-1) & wheat / harvest + harvest quality: better than usual (+1) & wine

+: one dimensional combination &: two (or more) dimensional combinations /: two different events (i.e. cause vs effect)

(Coding tree as used in tambora.org can be found here.)

(codeset_id,codeset_description,category,path,node_label,scale_label,scale_unit,value_label,value_index,average,variance,si_unit,si_average,si_variance )

https://sweet.jpl.nasa.gov/

Example

Rename LiPD file extension from zip to lpd

Description Tambora-Files LiPD-Files Remarks
Flood Example Media:flood_tambora_csv.zip Media:Exp0000.tambora.2017.zip
To-Do To-Do To-Do
To-Do To-Do To-Do

Media in category "Historical Documents Working Group"

The following 6 files are in this category, out of 6 total.