ThaiLegal: Benchmarking LLM Frameworks on Thai Legal Question Answering Capabilities

ThaiLegal: Benchmarking LLM Frameworks on Thai Legal Question Answering Capabilities

ACL ARR 2025 May Submission7750 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language models (LLMs) show promise in legal question answering (QA), yet Thai legal QA systems face challenges due to limited data and complex legal structures. We introduce ThaiLegal, a novel benchmark featuring two datasets: (1) ThaiLegal-CCL, covering Thai financial laws, and (2) ThaiLegal-Tax, containing Thailand's official tax rulings. Our benchmark also consists of specialized evaluation metrics suited for Thai legal QA. We evaluate retrieval-augmented generation (RAG) and long-context LLM (LCLM) approaches across three key dimensions: (1) the benefits of domain-specific techniques like hierarchy-aware chunking and cross-referencing, (2) comparative performance of RAG components, e.g., retrievers and LLMs, and (3) the potential of long-context LLMs to replace traditional RAG systems. Our results reveal that domain-specific components slightly improve over naive methods. At the same time, existing retrieval models still struggle with complex legal queries, and long-context LLMs have limitations in consistent legal reasoning. Our study highlights current limitations in Thai legal NLP and lays a foundation for future research in this emerging domain.

Paper Type: Long

Research Area: Special Theme (conference specific)

Research Area Keywords: benchmarking,legal NLP,datasets for low resource languages,retrieval-augmented generation,domain adaptation,question answering,Interdisciplinary Recontextualization of NLP

Contribution Types: Data resources, Data analysis

Languages Studied: Thai

Submission Number: 7750

Loading