EUR-Lex-Triples: A Legal Relation Extraction Dataset from European Legislation

Nihed Bendahman, Karen Pinel-Sauvagnat, Gilles Hubert, Mokhtar Boumedyen Billami

Published: 2025, Last Modified: 21 Jan 2026TPDL 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The field of Natural Language Processing (NLP) has witnessed exponential growth in recent years. With the rise of large language models, researchers have continuously expanded the scope of NLP tasks, pushing the boundaries of system capabilities. Nonetheless, some domains, such as the legal field, remain underexplored or progress at a slower pace. This is due to several challenges, including a highly specialized terminology, the substantial variability in legislative texts across countries which complicates generalization, and the limited availability of domain-specific resources and datasets. In this paper, we introduce EUR-Lex-Triples, a dataset annotated with triples formed by legislative references that are interconnected through relationships capturing the legal modifications they undergo. This dataset is built on the existing EUR-Lex-Sum corpus, which is composed of document-summary pairs collected from the European Union’s legal platform EUR-Lex. Eur-Lex-Triples includes 9,193 unique triples (subject, relation/predicate, object). To the best of our knowledge, this is the first dataset of triples accessible for use in the legal domain. This resource aims to bridge existing gaps and foster progress in the legal NLP field. In particular, we explore potential applications, such as relation extraction and knowledge graph construction, which can be used in various NLP tasks, including fact-driven legal summarization or retrieval-augmented generation (RAG) systems.

External IDs:dblp:conf/ercimdl/BendahmanPHB25