Abstract: With the recent advancement, automatic speech recognition (ASR) moves toward addressing low-resource speech recognition problems using large vocabulary continuous speech recognition (LVCSR). Transfer learning, meta-learning, and Unsupervised Pre-training are major techniques in the modern paradigm, and in this paper, We experimented with transfer learning using the English pre-trained model trained on top of the Recurrent Neural Network (RNN) with the baseline e2e Lattice-Free Maximum Mutual Information (e2e LF-MMI) model models with 200 hours of OpenSLR data and 40 hours of gathered Sinhala speech data. We used Facebook Sinhala main corpora and UCSC full corpus alongside the UCSC speech corpus to train the external language models. We were able to achieve 5.43\% WER for our testing dataset by far the best wer achieved for Low Resourced Sinhala Language. Finally, we evaluated the best e2e model with Google speech recognition API for Sinhala Speech Recognition using a publicly available dataset to examine how far we can use our model in common usage.
Paper Type: long
Research Area: Speech recognition, text-to-speech and spoken language understanding
0 Replies
Loading