Leveraging NLLB for Low-Resource Bidirectional Amharic – Afan Oromo Machine Translation

ICLR 2026 Conference Submission18715 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Neural Machine Translation (NMT), No Language Left Behind (NLLB), Natural Language Processing (NLP), Multilingual NLP
TL;DR: We optimize NLLB for low-resource Amharic–Afan Oromo translation, achieving large gains over Google Translate and baseline models, with BLEU scores above 42 in both directions.
Abstract: We present a bidirectional machine translation system for Amharic and Afan Oromo, two low-resource Ethiopian languages critical for cultural and linguistic accessibility. Leveraging pre-trained transformer models, we curated a high-quality parallel corpus of 667,021 human-edited sentence pairs, preprocessed through text normalization, non-target language filtering and dividing the data into training, validation, and test sets in a stratified way.. Using the Hugging Face Transformers library, we fine-tuned a sequence-to-sequence transformer architecture, optimized for the linguistic nuances of Ethio-Semitic and Cushitic languages, with tokenized input and dynamic padding for efficient batch processing. Our model significantly outperforms baselines, including Google Translate and NLLB models (600M, 1.3B, 3.3B parameters), which represent industry and research state-of-the-art for low-resource translation. For Amharic-to-Afan Oromo, it achieves a BLEU score of 42.19, surpassing Google Translate’s 9.6. For Afan Oromo-to-Amharic, it scores 42.82, exceeding NLLB-3.3B’s 5.72. Additional metrics (CHRF++, BERTScore) and low loss values confirm its robustness. These results highlight the efficacy of tailored fine-tuning for low-resource language pairs, advancing cross-lingual communication and digital accessibility in multilingual societies.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 18715
Loading