Measuring Diversity in Online NewsDownload PDF

21 Oct 2023OpenReview Archive Direct UploadReaders: Everyone
Abstract: The local news industry in the United States has been in decline in recent years, with newsrooms in the U.S. losing more than half of their staff over the last decade. Yet newspapers, especially those in the digital world, appear to be as abundant as ever, and surveys show that the general public believes local news to be thriving. In this paper, we investigate why this disconnect exists by studying the online newspaper landscape in the U.S. and quantifying the amount of content overlap in it. We present a novel technique that exploits shared structure in article URLs to detect identical news stories and uses these links between newspapers to automatically extract structures of media consolidation. Our technique is able to discover, with high precision, clusters of news websites owned by the same parent entity, as well as provide a lower bound on the proportion of content shared between them. Using a large dataset of news articles, we show that our approach is highly scalable and can also generalize to the international news landscape. We further extend this analysis to discover content-sharing that goes beyond media consolidation by using a large language model (GPT-3) to map news articles into vector embeddings and use them to quantify the amount of similarity, both lexical and semantic, in the news we consume every day.
0 Replies

Loading