Grammatical Error Correction for Low-Resource Languages: The Case of Zarma

ACL ARR 2024 June Submission2291 Authors

15 Jun 2024 (modified: 22 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Zarma is a Nilo-Saharan language spoken predominantly in West Africa. The limited availability of annotated data and the need for standardized orthography make grammatical error correction (GEC) particularly challenging for Zarma. This study presents a comparative analysis of GEC methods for Zarma, exploring classical GEC approaches such as rule-based methods, machine translation (MT) models, and state-of-the-art large language models (LLMs). Through rigorous evaluations, we compare the strengths and limitations of each method, assessing their effectiveness in identifying and correcting errors in Zarma texts. Our findings highlight the promising potential of both LLMs and MT models to significantly enhance GEC capabilities for low-resource languages, paving the way for developing more inclusive and robust NLP tools for African languages.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: Natural Language Processing (NLP), Grammatical Error Correction (GEC), Low-Resource Languages, Zarma Language, Bambara Language, Machine Translation (MT), Large Language Models (LLMs), Rule-Based Methods, Zero-Shot Learning
Contribution Types: Approaches to low-resource settings, Publicly available software and/or pre-trained models, Data resources
Languages Studied: Zarma, Bambara
Submission Number: 2291
Loading