Abstract: TimeML is an annotation scheme for capturing temporal information in text. The developers of TimeML built the TimeBank
corpus to both validate the scheme and provide a rich dataset of events, temporal expressions, and temporal relationships for
training and testing temporal analysis systems. In our own work we have been developing methods aimed at TimeML graphs
for detecting (and eventually automatically correcting) temporal inconsistencies, extracting timelines, and assessing temporal
indeterminacy. In the course of this investigation we identified numerous previously unrecognized issues in the TimeBank
corpus, including multiple violations of TimeML annotation guide rules, incorrectly disconnected temporal graphs, as well as
inconsistent, redundant, missing, or otherwise incorrect annotations. We describe our methods for detecting and correcting
these problems, which include: (a) automatic guideline checking (109 violations); (b) automatic inconsistency checking (65
inconsistent files); (c) automatic disconnectivity checking (625 incorrect breakpoints); and (d) manual comparison with the
output of state-of-the-art automatic annotators to identify missing annotations (317 events, 52 temporal expressions). We provide
our code as well as a set of patch files that can be applied to the TimeBank corpus to produce a corrected version for use by other
researchers in the field
0 Replies
Loading