From Scarcity to Efficiency: Investigating the Effects of Data Augmentation on African Machine Translation

Published: 14 Dec 2025, Last Modified: 10 Jan 2026LM4UC@AAAI2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: african machine translation, neural machine translation, data augmentation for neural machine translation, switchout, sentence concatenation, back translation
TL;DR: This work investigates the impact of switchout and sentence concatenation with back translation on 6 low resource african languages.
Abstract: The linguistic diversity across the African continent poses distinct challenges and opportunities for machine translation. Given the scarcity of labelled data for many African languages, this study explores the efficacy of data augmentation techniques for improving translation systems in low-resource languages. We focus on two techniques, named 'Sentence Concatenation with Back Translation' and 'Switch-Out data augmentation', applying them to six African languages. In addition, we analyse the performance of these techniques in both data-efficient and data-constrained scenarios for some selected languages. Our experiments show significant improvements in machine translation performance, with a minimum 25% increase in the BLEU score across all six languages. Our results emphasise the possible use of these techniques to improve machine translation systems for low-resource languages, contributing to the development of more robust translation systems for under-resourced languages. We provide a comprehensive analysis and discuss the broader implications of our findings for future research in machine translation.
Submission Number: 12
Loading