Keywords: low-resource+Languages, Computational+Linguistics
TL;DR: Multilingual LLMs perform well on high-resource languages like Farsi but struggle with low-resource Indo-Iranian languages. We benchmark models, reveal major gaps, and highlight the need for more data and evaluation.
Abstract: Multilingual large language models (LLMs) have achieved strong performance in high-resource languages, yet their capabilities in low-resource settings remain underexplored. This gap is particularly severe for several Indo-Iranian languages spoken across Muslim communities, such as Farsi/Dari, Pashto, Kurdish, Balochi, Mazandarani, Gilaki, Luri, and Ossetian. These languages represent tens of millions of speakers but receive limited attention in NLP research. In this paper we present a pilot, systematic evaluation of modern multilingual LLMs across six Indo-Iranian languages spanning high-, medium-, and low-resource levels. We assemble small evaluation sets from publicly available resources (Quran translations, Wikipedia, and parallel corpora), define three evaluation tasks (translation, factual question answering, sentiment classification), and run a reproducible, open experimental protocol comparing open-source models (mBERT, mT5-small, BLOOM-560M) and closed-source APIs (GPT-4, Google Translate). Our analysis highlights a large performance gap between Farsi and more regional/minority languages (Mazandarani, Gilaki, Ossetian), documents common failure modes (cultural mistranslation, hallucinations, dialect confusions), and proposes practical steps toward closing the gap including community-led data collection and lightweight adaptation techniques.
Track: Track 2: ML by Muslim Authors
Submission Number: 51
Loading