Abstract: This paper introduces a benchmark for evaluating Indigenous language knowledge in large language models using zero- and few-shot prompting. The benchmark includes three tasks: (1) language identification, (2) cloze completion of Spanish sentences aided by Indigenous-language translations, and (3) grammatical feature classification. We apply the benchmark to 13 Indigenous languages, including Bribri, Guarani, and Nahuatl, and evaluate models from five major families (GPT, Gemini, DeepSeek, Qwen, and LLaMA). Results reveal large differences across both languages and model families, with a small subset of model–language combinations showing consistently stronger performance across tasks, while other combinations remain close to random chance.
Paper Type: Short
Research Area: Multilingualism and Cross-Lingual NLP
Research Area Keywords: probing, multilingualism, indigenous languages
Contribution Types: Model analysis & interpretability
Languages Studied: Asháninka, Awajun, Aymara, Bribri, Chatino, Guarani, Nahuatl, Otomí, Quechua, Rarámuri, Shipibo-Konibo, Wayuu, Wixárika
Submission Number: 1003
Loading