Explaining Mixtures of Sources in News Articles

ACL ARR 2024 April Submission644 Authors

16 Apr 2024 (modified: 20 May 2024)ACL ARR 2024 April SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Human writers plan, \textit{then} write. For large language models (LLMs) to play a role in longer-form article generation, we must understand the planning steps humans make before writing. We explore one kind of planning, source-selection in news, as a case-study for evaluating plans in long-form generation. We ask: why do _specific_ stories call for _specific_ kinds of sources? We imagine a process where sources are selected to fall into different categories. Learning the article's _plan_ means predicting the categorization scheme chosen by the journalist. Inspired by latent-variable modeling, we first develop metrics to select the most likely plan underlying a story. Then, working with professional journalists, we adapt five existing approaches to planning and introduce three new ones. We find that two approaches, or schemas: _stance_ and _social affiliation_ best explain source plans in most documents. However, other schemas like _textual entailment_ explain source plans in factually rich topics like ``Science''. Finally, we find we can predict the most suitable schema given just the article's headline with reasonable accuracy. We see this as an important case-study for human planning, and provides a framework and approach for evaluating other kinds of plans, like discourse or plot-oriented plans. We release a corpora, _NewsSources_, with schema annotations for 4M articles, for further study.
Paper Type: Long
Research Area: Computational Social Science and Cultural Analytics
Research Area Keywords: quantitative analyses of news and/or social media
Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English
Submission Number: 644
Loading