Track: Scientific Track
Keywords: parsing, legal documents understanding, fedlex
Abstract: Understanding dependencies between legal provisions is essential for analyzing statutory corpora; yet, such relationships are rarely available in machine-readable form. We present a hybrid pipeline for extracting article-level dependencies from Swiss federal legislation on Fedlex, combining deterministic XML preprocessing with large language model (LLM)–based semantic resolution. Additionally, we release three complementary data splits – document-level JSON, structured citation candidates, and LLM-based article assignments – to support downstream legal NLP research.
We evaluate our approach on 2,103 SR documents, yielding over 63,000 citation instances. While LLMs are effective at resolving semantically complex references, we observe substantial limitations in structured output reliability: approximately 21\% of generated items violate the expected schema, with most errors being unrecoverable.
Our findings highlight a key challenge in applying LLMs to structured legal information extraction and provide a new resource for tasks such as legal knowledge graph construction, citation analysis, and benchmarking structured prediction in the legal domain.
Submission Number: 37
Loading