Abstract: We present a deterministic sieve-based
system for attributing quotations in literary
text and a new dataset: QuoteLi31. Quote
attribution, determining who said what in
a given text, is important for tasks like
creating dialogue systems, and in newer
areas like computational literary studies,
where it creates opportunities to analyze
novels at scale rather than only a few at
a time. We release QuoteLi3, which con-
tains more than 6,000 annotations linking
quotes to speaker mentions and quotes to
speaker entities, and introduce a new al-
gorithm for quote attribution. Our two-
stage algorithm first links quotes to men-
tions, then mentions to entities. Using two
stages encapsulates difficult sub-problems
and improves system performance. The
modular design allows us to tune either
for overall performance or for the high
precision appropriate for many use cases.
Our system achieves an average F-score
of 87.5% across three novels, outperform-
ing previous systems, and can be tuned for
precision of 90.4% at a recall of 65.1%.
Loading