Detection and attribution of quotes in Finnish news media: BERT vs. rule-based approachDownload PDF

Published: 20 Mar 2023, Last Modified: 29 Mar 2023NoDaLiDa 2023Readers: Everyone
TL;DR: We annotated a corpus of around 1500 Finnish news texts with quote and coreference annotations, and test 2 methods for automatic detection of quotes: rule-based (utilizing dependency parsing) and machine learning (fine-tuned BERT).
Abstract: We approach the problem of recognition and attribution of quotes in Finnish news media. Solving this task would create possibilities for large-scale analysis of media wrt. the presence and styles of presentation of different voices and opinions. We describe the annotation of a corpus of media texts, numbering around 1500 articles, with quote attribution and coreference information. Further, we compare two methods for automatic quote recognition: a rule-based one operating on dependency trees and a machine learning one built on top of the BERT language model. We conclude that BERT provides more promising results even with little training data, achieving 95% F-score on direct quote recognition and 84% for indirect quotes. Finally, we discuss open problems and further associated tasks, especially the necessity of resolving speaker mentions to entity references.
4 Replies

Loading