Keywords: legal documents, knowledge graph construction, large language models
TL;DR: We propose NORKE, a framework that constructs rule-centric knowledge graphs from normative documents using ontology-constrained LLM extraction anchored on the legal article caput
Abstract: Normative documents exhibit a well-defined logical structure but high semantic density, characterized by recurring terminology, cross-references, and hierarchical rule dependencies. These characteristics pose significant challenges for AI systems, which often struggle to retrieve factually grounded and contextually accurate information from such texts. Despite the growing body of research on Legal Knowledge Graphs (KGs), little attention has been paid to structure-oriented KG construction that explicitly targets the extraction of normative rules from canonical legal elements. Existing approaches often emphasize entity extraction or case-based reasoning rather than modeling the document's internal normative architecture. This investigation proposes the NORKE framework, a structure-aware solution for constructing KGs from normative documents within the Civil Law tradition. Our approach leverages the inherent normative hierarchy, Articles, Caput, Paragraphs, Items, and Sub-items to extract and formalize rules as semantically cohesive RDF triples. Our solution employs an ontology-guided extraction process in which a Large Language Model (LLM) assists in transforming structured normative segments into rule representations aligned with the domain ontology. To evaluate our proposed framework, we constructed a benchmark dataset comprising 700 question–answer pairs drawn from real-world normative documents in Portuguese, including a subset of deliberately ambiguous queries curated by humans. Experimental results show that the resulting KG created by applying NORKE supports factually grounded question answering, achieving an overall approval rate of 82.1% while maintaining stable effectiveness across different levels of linguistic complexity. The obtained findings suggest that our rule-centric KG construction approach can improve traceability and factual grounding when retrieving information from normative documents.
Submission Number: 13
Loading