CTC regularized model adaptation for improving LSTM RNN based multi-accent Mandarin speech recognitionDownload PDFOpen Website

Published: 2016, Last Modified: 11 May 2023ISCSLP 2016Readers: Everyone
Abstract: This paper proposes a novel regularized adaptation method for long short term memory (LSTM) recurrent neural network (RNN) based acoustic model trained with connectionist temporal classification (CTC) loss function (LSTM-RNN-CTC) to improve the performance of multi-accent Mandarin speech recognition task. In general, directly adjusting the network parameters with a small adaptation set may lead to over-fitting. In order to avoid this problem, we add a regularization term to the original training criterion. It forces the conditional probability distribution over initial and final (I/F) sequences estimated from the adapted model to be close to the accent independent (AI) model. Meanwhile, hidden layers of LSTM RNN should not be adjusted, but only the accent-specific output layer needs to be fine-tuned using this adaptation method. Experiments on RASC863 and CASIA regional accent speech corpus show that the proposed method obtains obvious improvement when compared with LSTM-RNN-CTC baseline model.
0 Replies

Loading