ThaiLegal: Benchmarking LLM Frameworks on Thai Legal Question Answering Capabilities

ACL ARR 2025 May Submission7750 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large language models (LLMs) show promise in legal question answering (QA), yet Thai legal QA systems face challenges due to limited data and complex legal structures. We introduce ThaiLegal, a novel benchmark featuring two datasets: (1) ThaiLegal-CCL, covering Thai financial laws, and (2) ThaiLegal-Tax, containing Thailand's official tax rulings. Our benchmark also consists of specialized evaluation metrics suited for Thai legal QA. We evaluate retrieval-augmented generation (RAG) and long-context LLM (LCLM) approaches across three key dimensions: (1) the benefits of domain-specific techniques like hierarchy-aware chunking and cross-referencing, (2) comparative performance of RAG components, e.g., retrievers and LLMs, and (3) the potential of long-context LLMs to replace traditional RAG systems. Our results reveal that domain-specific components slightly improve over naive methods. At the same time, existing retrieval models still struggle with complex legal queries, and long-context LLMs have limitations in consistent legal reasoning. Our study highlights current limitations in Thai legal NLP and lays a foundation for future research in this emerging domain.
Paper Type: Long
Research Area: Special Theme (conference specific)
Research Area Keywords: benchmarking,legal NLP,datasets for low resource languages,retrieval-augmented generation,domain adaptation,question answering,Interdisciplinary Recontextualization of NLP
Contribution Types: Data resources, Data analysis
Languages Studied: Thai
Submission Number: 7750
Loading