Keywords: Legal NLP, NLP Applications, Question Answering, Information Extraction, graph-based methods
TL;DR: We build the first temporal knowledge graph for Thai law, unifying 3,840 statutes with 87,394 court decisions, and show graph-structured retrieval achieves Citation F1 0.812 vs 0.685 (vector RAG) on NitiBench-Tax while searching a 53x larger corpus
Abstract: Jurisdictionally bound domains, such as law, often lack standardized, machine-readable data formats, requiring foundational infrastructure before downstream applications can succeed. We present ThLexGraph, the first unified temporal knowledge graph for Thai legal data, integrating 3,840 laws (6,273 versions) with 87,394 Supreme Court decisions, updated daily. The graph encodes hierarchy, temporal versioning, cross-references, and sequential order, all extracted from unstructured official sources where no structured representation previously existed. A five-setting comparison on NitiBench-Tax isolates data infrastructure as the sole variable: graph-structured retrieval achieves Citation F1 of 0.812 versus 0.666 for practitioner-standard web search and 0.685 for flat vector retrieval, while searching a corpus 53x larger. Trace analysis of 820 agent-issued queries reveals that hierarchy traversal and cross-reference following, capabilities absent from generic retrieval, are exercised in 50% and 16% of questions, respectively. Our system demonstrates that structured modeling of hierarchy, temporal versioning, cross-references, and sequential order can overcome structural limitations of legal data published without standardized formats.
Submission Type: Emerging
Copyright Form: pdf
Submission Number: 430
Loading