Abstract: Human writers plan, _then_ write. For large language models (LLMs) to play a role in longer-form article generation, we must understand the planning steps humans make before writing. We explore one kind of planning, source-selection in news, as a case-study for evaluating plans in long-form generation. We ask: why do specific stories call for specific kinds of sources? We imagine a process where sources are selected to fall into different categories. Learning the article's plan means predicting the categorization scheme chosen by the journalist. Inspired by latent-variable modeling, we first develop metrics to select the most likely plan underlying a story. Then, working with professional journalists, we adapt five existing approaches to planning and introduce three new ones. We find that two approaches, or schemas: stance and social affiliation best explain source plans in most documents. However, other schemas like textual entailment explain source plans in factually rich topics like ``Science''. Finally, we find we can predict the most suitable schema given just the article's headline with reasonable accuracy. We see this as an important case-study for human planning, and provides a framework and approach for evaluating other kinds of plans, like discourse or plot-oriented plans. We release a corpora, NewsSources, with schema annotations for 4M articles, for further study.
Paper Type: Long
Research Area: Computational Social Science and Cultural Analytics
Research Area Keywords: quantitative analyses of news and/or social media
Contribution Types: Model analysis & interpretability, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English
Submission Number: 1176
Loading