Explaining Mixtures of Sources in News ArticlesDownload PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
TL;DR: Introduce "source categorization" in news as a new task, curated 8 schemas for classifying sources, compared them on a large dataset
Abstract: Writers often use different informational sources to inform storytelling, yet little is understood about why different sources are chosen. Are sources chosen primarily because they disagree? Because they represent different groups? In this work, we seek to explain why humans combine sources in news articles by comparing different schemas for information categorization. We adapt five existing schemas to the new task of source categorization, and introduce three novel ones. For a given document, our goal is to identify the schema best describing its sources. We do so by viewing the categorization implied by a schema as a latent variable assignment, and choosing the assignment that maximizes the probability of observing the document. We find two schemas: stance and social affiliation (a schema we introduce) best explain sourcing in the most documents, but other schemas explain for certain topics (e.g. NLI best describes fact-heavy topics like ``Science''). Finally, we find we can predict the optimal schema given just the headline of an article with moderate accuracy. This hints an application to planning source retrieval in areas such as retrieval-augmented generation.
Paper Type: long
Research Area: Computational Social Science and Cultural Analytics
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources, Data analysis, Surveys, Theory
Languages Studied: English
0 Replies

Loading