Keywords: RAG, Law, Hallucination
Abstract: Retrieval-augmented generation (RAG) promises to bridge complex legal statutes and public understanding, yet hallucination remains a critical barrier in real-world use. Because statutes evolve and provisions frequently cross-reference, maintaining *temporal currency* and *citation awareness* is essential, favoring up-to-date sources over static parametric memory.
To study these issues, we focus on the under-examined domain of South Korean fire safety regulation—a complex web of fragmented legislation, dense cross-references, and vague decrees. We introduce **SearchFireSafety**, the first RAG-oriented question-answering (QA) resource for this domain. It includes: (i) 941 real-world, open-ended QA pairs from public inquiries (2023–2025); (ii) a corpus of 4{,}437 legal documents from 117 statutes with a citation graph; and (iii) synthetic single-hop (Yes/No) and multi-hop (MCQA) benchmarks targeting legal reasoning and uncertainty.
Experiments with four retrieval strategies and five Korean-capable LLMs show that: (1) multilingual dense retrievers excel due to the domain's mix of Korean, English loanwords, and Sino-Korean terms (i.e., Chinese characters); (2) grounding LLMs with SearchFireSafetysubstantially improves factual accuracy; but (3) multi-hop reasoning still fails to resolve conflicting provisions or recognize informational gaps. Our results affirm that RAG is necessary but not yet sufficient for legal QA, and we offer SearchFireSafety as a rigorous testbed to drive progress in Legal AI.
All data resources are labeled using a novel and transparent annotation pipeline, available at: https://anonymous.4open.science/r/SearchFireSafety-C2AB/.
Primary Area: datasets and benchmarks
Submission Number: 19077
Loading