Sequence Discriminative Distributed Training of Long Short-Term Memory Recurrent Neural Networks

Has ̧im Sak, Oriol Vinyals, Georg Heigold Andrew Senior, Erik McDermott, Rajat Monga, Mark Mao,

20 Jul 2020OpenReview Archive Direct UploadReaders: Everyone

Abstract: We recently showed that Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) outperform state-of-the-art deep neural networks (DNNs) for large scale acoustic model- ing where the models were trained with the cross-entropy (CE) criterion. It has also been shown that sequence discrimina- tive training of DNNs initially trained with the CE criterion gives significant improvements. In this paper, we investigate se- quence discriminative training of LSTM RNNs in a large scale acoustic modeling task. We train the models in a distributed manner using asynchronous stochastic gradient descent opti- mization technique. We compare two sequence discriminative criteria – maximum mutual information and state-level mini- mum Bayes risk, and we investigate a number of variations of the basic training strategy to better understand issues raised by both the sequential model, and the objective function. We ob- tain significant gains over the CE trained LSTM RNN model using sequence discriminative training techniques.

0 Replies