Creating a Novel Geolocation Corpus from Historical TextsDownload PDF

14 Sept 2020OpenReview Archive Direct UploadReaders: Everyone
Abstract: This paper describes the process of annotating a historical US civil war corpus with geographic reference. Reference annotations are given at two different textual scales: individual place names and documents. This is the first published corpus of its kind in document-level geolocation, and it has over 10,000 disambiguated toponyms, double the amount of any prior toponym corpus. We outline many challenges and considerations in creating such a corpus, and we evaluate baseline and benchmark toponym resolution and document geolocation systems on it. Aspects of the corpus suggest several recommendations for proper annotation procedure for the tasks.
0 Replies

Loading