Multilingual Automatic Speech Recognition for Kinyarwanda, Swahili, and Luganda: Advancing ASR in Select East African LanguagesDownload PDF

Published: 03 Mar 2023, Last Modified: 20 Apr 2023AfricaNLP 2023Readers: Everyone
Keywords: Automatic Speech Recognition, Deep Learning, African Languages, Multilingual ASR, Kinyarwanda, Swahili, Luganda, Conformer, Common Voice, Low resource ASR, East African ASR, Bantu Languages
TL;DR: We created a multilingual ASR dataset and model on Kinyarwanda, Swahili and Luganda with 21.91 WER across the languages and 25.48, 17.22 and 21.95 on the languages respectively.
Abstract: This paper presents a multilingual Automatic Speech Recognition (ASR) model for three East African languages—Kinyarwanda, Swahili, and Luganda. The Common Voice project's African languages datasets were used to produce a curated code-switched dataset of 3,900 hours on which the ASR model was trained. The work included validating the Kinyarwanda dataset and developing a model that achieves a 17.57 Word Error Rate (WER) on the language. Across all three languages, the Kinyarwanda model was finetuned and achieved a WER of 21.91 on the three curated datasets, with a WER of 25.48 for Kinyarwanda, 17.22 for Swahili, and 21.95 for Luganda. The paper emphasizes the necessity of considering the African environment when developing effective ASR systems and the significance of supporting many languages when developing ASR for languages with limited resources.
0 Replies

Loading