Abstract: Indirect speech is a fundamental yet understudied form of reported speech that plays a crucial role in literary texts and communication. While direct speech detection has received significant attention in computational linguistics, the automatic identification of indirect speech remains a challenge due to its nuanced linguistic structure and contextual dependencies. This paper focuses on the detection of indirect speech in late 19th-century Scandinavian literature, where its presence has been linked to shifting aesthetic ideals. We present an annotated dataset of 150 segments, each randomly selected from 150 different novels, designed to capture indirect speech in Danish and Norwegian literature. We evaluate four pre-trained language models for classifying indirect speech, with results showing that a Danish Foundation Model (DFM Large), trained on extensive Danish data, has the highest performance.
Paper Type: Short
Research Area: Discourse and Pragmatics
Research Area Keywords: NLP tools for social analysis, discourse parsing, dialogue, conversation, discourse and multilinguality, corpus creation, benchmarking, language resources
Contribution Types: NLP engineering experiment, Data resources, Data analysis
Languages Studied: Danish, Norwegian
Submission Number: 608
Loading