ThaiInstruct: An instruction-following Dataset for Culturally-Aware, Multitask, and Multi-domain Evaluation in Thai

ThaiInstruct: An instruction-following Dataset for Culturally-Aware, Multitask, and Multi-domain Evaluation in Thai

ACL ARR 2025 May Submission1835 Authors

18 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large language models excel at instruction-following in English, but their performance in low-resource languages like Thai remains underexplored. Existing benchmarks often rely on translations, missing cultural and domain-specific nuances needed for real-world use. We present ThaiInstruct, a human-authored Thai dataset for evaluation and instruction tuning, covering four professional domains and seven task types. Created through a multi-stage quality control process with annotators, domain experts, and AI researchers, ThaiInstruct supports two studies: (1) a zero-shot evaluation showing performance gaps on culturally and professionally specific tasks, and (2) an instruction tuning study with ablations isolating the effect of native supervision. Models fine-tuned on ThaiInstruct outperform those using translated data in both in-domain and out-of-domain benchmarks. These findings underscore the need for culturally and professionally grounded instruction data to improve LLM alignment in low-resource, linguistically diverse settings.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: corpus creation; benchmarking; evaluation; datasets for low resource languages;

Contribution Types: Model analysis & interpretability, NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources, Data analysis

Languages Studied: Thai

Submission Number: 1835

Loading