Text2Story Lusa: A Dataset for Narrative Analysis in European Portuguese News Articles

Sérgio Nunes, Alípio Mário Jorge, Evelin Amorim, Hugo O. Sousa, Antonio Leal, Purificação Moura Silvano, Inês Cantante, Ricardo Campos

Published: 2024, Last Modified: 06 Jan 2026LREC/COLING 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Narratives have been the subject of extensive research across various scientific fields such as linguistics and computer science. However, the scarcity of freely available datasets, essential for studying this genre, remains a significant obstacle. Furthermore, datasets annotated with narratives components and their morphosyntactic and semantic information are even scarcer. To address this gap, we developed the Text2Story Lusa datasets, which consist of a collection of news articles in European Portuguese. The first datasets consists of 357 news articles and the second dataset comprises a subset of 117 manually densely annotated articles, totaling over 50 thousand individual annotations. By focusing on texts with substantial narrative elements, we aim to provide a valuable resource for studying narrative structures in European Portuguese news articles. On the one hand, the first dataset provides researchers with data to study narratives from various perspectives. On the other hand, the annotated dataset facilitates research in information extraction and related tasks, particularly in the context of narrative extraction pipelines. Both datasets are made available adhering to FAIR principles, thereby enhancing their utility within the research community.