Fact from Fiction: Finding Serialized Novels in Newspapers

Pascale Feldkamp; Alie Lassche; Katrine Frøkjær Baunvig; Kristoffer Nielbo; Yuri Bizzoni

Fact from Fiction: Finding Serialized Novels in Newspapers

Pascale Feldkamp, Alie Lassche, Katrine Frøkjær Baunvig, Kristoffer Nielbo, Yuri Bizzoni

Published: 22 Jun 2025, Last Modified: 17 Jul 2025ACL-SRW 2025 OralEveryoneRevisionsBibTeXCC BY 4.0

Keywords: fictionality detection, stylistics, historical newspapers, semantic embeddings, affective dynamics

TL;DR: We identify fiction in 19th-century Danish newspapers with up to 0.91 F1 using linguistic cues like sentiment and information density. Results support scalable, corpus-level modeling of the fiction–nonfiction boundary.

Abstract: Digitized literary corpora of the 19th century favor canonical and novelistic forms, sidelining a broader and more diverse literary production. Serialized fiction – widely read but embedded in newspapers – remains especially underexplored, particularly in low-resource languages like Danish. This paper addresses this gap by developing methods to identify fiction in digitized Danish newspapers (1818–1848). We (1) introduce a manually annotated dataset of 1,394 articles and (2) evaluate classification pipelines using both selected linguistic features and embeddings, achieving F1-scores of up to 0.91. Finally, we (3) analyze feuilleton fiction via interpretable features to test its drift in discourse from neighboring nonfiction. Our results support the construction of alternative literary corpora and contribute to ongoing work on modeling the fiction–nonfiction boundary by operationalizing discourse-level distinctions at scale.

Archival Status: Archival

Acl Copyright Transfer: pdf

Paper Length: Short Paper (up to 4 pages of content)

Submission Number: 156

Loading