Abstract: Information prioritization plays an important role in the way humans perceive and understand the world. Homepage layouts serve as a tangible proxy for this prioritization. In this work, we present NewsHomepages, a large dataset of over 3,000 new website homepages (including local, national, and topic-specific outlets) captured twice daily over a three-year period. We develop models to perform pairwise comparisons between news items to infer the human preferences expressed in homepage layouts, showing over 0.8 F1 score across the majority of tested cases. We apply our models to rank-order a collection of local city council policies passed over a ten-year period in San Francisco, assessing their ``newsworthiness''. Our findings lay the groundwork for leveraging implicit organizational cues to deepen our understanding of information prioritization.
Paper Type: Long
Research Area: Computational Social Science and Cultural Analytics
Research Area Keywords: Resources and Evaluation, Computational Journalism, Computational Social Science
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources, Data analysis
Languages Studied: English, Japanese, Italian, Spanish, French, 30+ other languages
Submission Number: 2219
Loading