Keywords: Automatic Speech Recognition, Low-resource regime, Transfer Learning, Nigerian-accented English .
Abstract: Automatic Speech Recognition (ASR) systems have become ubiquitous in our daily lives, powering voice assistants and transcription services. However, these systems often overlook the diverse range of accents, including Nigerian-accented English, as they are primarily developed and trained on native English accents. This research addresses this gap by developing a Nigerian-accented English ASR system. By creating ASR models capable of accurately interpreting and transcribing Nigerian-accented English, we strive to ensure equitable access to ASR technologies and services for individuals with Nigerian accents. Using Nigerian-accented data, the study employed transfer learning techniques on NeMo’s QuartzNet15x5 English model and Wav2vec2.0 XLS-R300M. NeMo QuartzNet15x5Base-En exhibited the fastest inference time of 0.156 seconds with a Word Error Rate (WER) of 8.2% on the test set and Wav2Vec2 XLS-R-300M achieved a WER of 14.9% on the test set with an inference time of 1.1 seconds. This work presents the NeMo QuartzNet15x5Base-En pretrained model as best for ASR modeling, especially in a low-resource regime.
Submission Category: Machine learning algorithms
Submission Number: 29
Loading