Breaking the Low-Resource Barrier for Dagbani ASR: From Data Collection to ModelingDownload PDF

Published: 03 Mar 2023, Last Modified: 15 Apr 2023AfricaNLP 2023Readers: Everyone
Keywords: ASR, Speech Recognition, Data Creation, Dataset, wav2vec 2.0
TL;DR: ASR Data and Models for Dagbani
Abstract: Developing Automatic Speech Recognition (ASR) systems requires large amounts of high-quality speech data. However, for low-resourced African languages, collecting and annotating such data is challenging due to acute data scarcity and limited funding. As a result, building ASR technologies for these languages remains a daunting task. This paper addresses this challenge for Dagbani by presenting a data collection pipeline and process for a transcribed Dagbani audio dataset. Dagbani is an African language spoken predominantly in Ghana and in parts of northern Togo. We then apply the data to build the world’s first Automatic Speech Recognition (ASR) system for Dagbani. We hope this methodology can serve as a blueprint or guideline for other similar efforts.
0 Replies

Loading