STARD: A Statute Retrieval Dataset for Layperson Queries

ACL ARR 2024 April Submission525 Authors

16 Apr 2024 (modified: 08 May 2024)ACL ARR 2024 April SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Statute retrieval aims to find relevant statutory articles for specific queries. This process is the basis of a wide range of legal applications such as legal advice, automated judicial decisions, and logical legal analysis. Existing statute retrieval benchmarks emphasize formal legal queries from sources like bar exams and Supreme Court cases. This neglects layperson queries, which often lack precise legal terminology and ambiguously reference legal concepts. In this study, we introduce the STAtute Retrieval Dataset (STARD), a dataset derived from real-world legal consultation questions made by the general public. Unlike existing statute retrieval datasets that focus predominantly on professional legal queries, STARD captures the complexity and diversity of layperson queries. Through a comprehensive evaluation of various retrieval baselines, including conventional methods and those employing advanced techniques such as GPT-4, we reveal that existing retrieval approaches all fall short of achieving optimal results. Additionally, we show that employing STARD as a Retrieval-Augmented Generation (RAG) dataset markedly improves LLM's performance on legal tasks, which indicates that STARD is a pivotal resource for developing more accessible and effective legal systems.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: Information Retrieval, NLP for legal applications
Contribution Types: Model analysis & interpretability, Data resources, Data analysis
Languages Studied: Chinese, English
Submission Number: 525