Keywords: NMT, Low resource, transformer, Arabic, Swahili
TL;DR: Benchmarking the Arabic-Swahili neural machine translation
Abstract: Building neural machine translation (NMT) systems for low-resource languages poses several challenges, mainly due to the lack of parallel data. In this research, we propose a baseline NMT system for translating between Arabic and Swahili. Despite being spoken by nearly 300 million individuals worldwide, the parallel corpus between these two languages is severely underrepresented. To address this, we scraped and processed the largest high-quality parallel corpus of Swahili and Arabic to our knowledge. We then used state-of-the-art NMT models, including Transformers and multilingual variants of Transformers, to build a baseline for bidirectional Arabic-Swahili NMT. Finally, we report an increase in the performance of our NMT system using the back-translation technique.
4 Replies
Loading