Do Large Language Models Speak All Languages Equally? A Comparative Study in Low-Resource SettingsDownload PDF

Anonymous

16 Feb 2024ACL ARR 2024 February Blind SubmissionReaders: Everyone
Abstract: Large language models (LLMs) have garnered significant interest in natural language processing (NLP), particularly for their remarkable performance in various downstream tasks in resource-rich languages such as English. However, the applicability and efficacy of LLMs in low-resource language contexts remain largely unexplored, thus highlighting a notable gap in linguistic capabilities for these languages. The limited utilization of LLMs in low-resource scenarios is primarily attributed to constraints such as dataset scarcity, computational costs, and research lacunae specific to low-resource languages. To address this gap, we comprehensively examines zero-shot learning using multiple LLMs in both English and low-resource languages. Our findings indicate that GPT-4 consistently outperforms Llama 2 and Gemini, with English consistently demonstrating superior performance across diverse tasks compared to low-resource languages. Furthermore, our analysis reveals that among the evaluated tasks, natural language inference (NLI) exhibits the highest performance, with GPT-4 demonstrating superior capabilities. This research underscores the imperative of assessing LLMs in low-resource language contexts to augment their applicability in general-purpose NLP applications.
Paper Type: long
Research Area: Multilinguality and Language Diversity
Contribution Types: Model analysis & interpretability, Approaches to low-resource settings
Languages Studied: English, Bangla, Hindi, and Urdu
0 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview