ThaiLegal: Benchmarking LLM Frameworks on Thai Legal Question Answering Capabilities

ThaiLegal: Benchmarking LLM Frameworks on Thai Legal Question Answering Capabilities

ACL ARR 2025 February Submission3272 Authors

15 Feb 2025 (modified: 09 May 2025)ACL ARR 2025 February SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language models (LLMs) show promise in legal question answering (QA), yet Thai legal QA systems face challenges due to limited data and complex legal structures. We introduce ThaiLegal, a novel benchmark featuring two datasets: (1) ThaiLegal-CCL Dataset, covering Thai financial laws, and (2) ThaiLegal-Tax Dataset, containing Thailand's official tax rulings.Our benchmark also consists of specialized evaluation metrics suited for Thai legal QA. We evaluate retrieval-augmented generation (RAG) and long-context LLM (LCLM) approaches across three key dimensions: (1) the benefits of domain-specific techniques like hierarchy-aware chunking and cross-referencing, (2) comparative performance of RAG components e.g. retrievers and LLMs, and (3) the potential of long-context LLMs to replace traditional RAG systems. Our results reveal that domain-specific components slightly improves over naive methods, while existing retrieval models still struggle with complex legal queries and long-context LLMs have limitations in consistent legal reasoning. Our study highlights current limitations in Thai legal NLP and lays a foundation for future research in this emerging domain.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: benchmarking,legal NLP,datasets for low resource languages,retrieval-augmented generation,domain adaptation,logical reasoning

Contribution Types: Data resources, Data analysis

Languages Studied: Thai

Submission Number: 3272

Loading