NewsEdits 2.0: Learning the Intentions Behind Updating NewsDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: News articles are often published and republished. Their revision histories give us insights into the journalistic process and can assist in the development of computational journalism tools. They also make it challenging for large language models (LLMs) trained with news to reconcile conflicting, updating information. In this work, we release \textit{NewsEdits 2.0}, based on \newcite{spangher2022newsedits}'s large corpus of news article revision histories. \textit{NewsEdits 2.0} introduces a taxonomy of edit-intention categories, including coarse categories: Fact Updates, Stylistic Updates, Contextual/Narrative Changes and XX finer-grained categories. In the first part of our work, we collect ZZ human-labeled annotations on 600 revision-pairs, and show that we can model these categories using small, scalable ensemble models with high F1 score (YY). In the second part of our work we seek to model, given old versions of news articles: \textit{will this article have fact updates? Will it have a style updates?} We show that, while pretrained LLMs fail at this task, fine-tuning can boost performance to YY accuracy. Finally, we show via a novel use-case, \textit{Question Answering with outdated references}, that \textit{NewsEdits 2.0} should play an important role for users.
Paper Type: long
Research Area: Computational Social Science and Cultural Analytics
Contribution Types: Data resources, Data analysis
Languages Studied: English
0 Replies

Loading