Abstract: This paper describes the process of annotating a historical US civil war corpus
with geographic reference. Reference annotations are given at two different textual
scales: individual place names and documents. This is the first published corpus
of its kind in document-level geolocation,
and it has over 10,000 disambiguated toponyms, double the amount of any prior
toponym corpus. We outline many challenges and considerations in creating such
a corpus, and we evaluate baseline and
benchmark toponym resolution and document geolocation systems on it. Aspects of
the corpus suggest several recommendations for proper annotation procedure for
the tasks.
0 Replies
Loading