MalayMMLU: A Multitask Benchmark for the Low-Resource Malay Language

ACL ARR 2024 June Submission2050 Authors

15 Jun 2024 (modified: 12 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large Language Models (LLMs) exhibit advanced proficiency in language reasoning and comprehension across a wide array of languages. While their performance is notably robust in well-resourced languages, the capabilities of LLMs in low-resource languages, such as Bahasa Malaysia (hereinafter referred to as Malay), remain less explored due to a scarcity of dedicated studies and benchmarks. To enhance our understanding of LLMs' performance in Malay, we introduce the first multi-task language understanding benchmark specifically for this language, named MalayMMLU. This benchmark comprises 24,213 questions spanning both primary (Year 1-6) and secondary (Form 1-5) education levels in Malaysia, encompassing 5 broad topics that further divide into 22 subjects. We conducted an empirical evaluation of 18 LLMs, assessing their proficiency in both Malay and the nuanced contexts of Malaysian culture using this benchmark. We will release the MalayMMLU benchmark and the corresponding code publicly upon paper acceptance.
Paper Type: Long
Research Area: Resources and Evaluation
Research Area Keywords: datasets for low resource languages; benchmarking; language resources; NLP datasets;
Contribution Types: Approaches to low-resource settings, Data resources, Data analysis
Languages Studied: Malay; Bahasa Malaysia
Submission Number: 2050
Loading