Identifying Informational Sources in News Articles

Alexander Spangher; Nanyun Peng; Emilio Ferrara; Jonathan May

Identifying Informational Sources in News Articles

Alexander Spangher, Nanyun Peng, Emilio Ferrara, Jonathan May

Published: 07 Oct 2023, Last Modified: 01 Dec 2023EMNLP 2023 MainEveryoneRevisionsBibTeX

Submission Type: Regular Long Paper

Submission Track: Computational Social Science and Cultural Analytics

Submission Track 2: NLP Applications

Keywords: computational journalism, source prediction, document-level modeling

TL;DR: We collect training data and train models to accurately identify a range of sources informing news writing. We show that news articles use sources in typical patterns that can be predicted.

Abstract: News articles are driven by the informational sources journalists use in reporting. Modeling when, how and why sources get used together in stories can help us better understand the information we consume and even help journalists with the task of producing it. In this work, we take steps toward this goal by constructing the largest and widest-ranging annotated dataset, to date, of informational sources used in news writing. We first show that our dataset can be used to train high-performing models for information detection and source attribution. Then, we introduce a novel task, source prediction, to study the compositionality of sources in news articles -- i.e. how they are chosen to complement each other. We show good modeling performance on this task, indicating that there is a pattern to the way different sources are used \textit{together} in news storytelling. This insight opens the door for a focus on sources in narrative science (i.e. planning-based language generation) and computational journalism (i.e. a source-recommendation system to aid journalists writing stories). All data and model code can be found at https://github.com/alex2awesome/source-exploration.

Submission Number: 5018

Loading