Towards Transfer Learning for end-to-end Sinhala Speech Recognition by Finetuning Pretrained ModelsDownload PDF

Anonymous

16 Dec 2022 (modified: 05 May 2023)ACL ARR 2022 December Blind SubmissionReaders: Everyone
Abstract: With the recent advancement, automatic speech recognition (ASR) moves toward addressing low-resource speech recognition problems using large vocabulary continuous speech recognition (LVCSR). Transfer learning, meta-learning, and Unsupervised Pre-training are major techniques in the modern paradigm, and in this paper, We experimented with transfer learning using the English pre-trained model trained on top of the Recurrent Neural Network (RNN) with the baseline e2e Lattice-Free Maximum Mutual Information (e2e LF-MMI) model models with 200 hours of OpenSLR data and 40 hours of gathered Sinhala speech data. We used Facebook Sinhala main corpora and UCSC full corpus alongside the UCSC speech corpus to train the external language models. We were able to achieve 5.43\% WER for our testing dataset by far the best wer achieved for Low Resourced Sinhala Language. Finally, we evaluated the best e2e model with Google speech recognition API for Sinhala Speech Recognition using a publicly available dataset to examine how far we can use our model in common usage.
Paper Type: long
Research Area: Speech recognition, text-to-speech and spoken language understanding
0 Replies

Loading