Real-Time One-Pass Decoder for Speech Recognition Using LSTM Language Models

Javier Jorge; Adrià Giménez; Javier Iranzo-Sánchez; Jorge Civera; Albert Sanchís; Alfons Juan

Real-Time One-Pass Decoder for Speech Recognition Using LSTM Language Models

Javier Jorge, Adrià Giménez, Javier Iranzo-Sánchez, Jorge Civera, Albert Sanchís, Alfons Juan

Published: 01 Jan 2019, Last Modified: 12 May 2025INTERSPEECH 2019EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Recurrent Neural Networks, in particular Long-Short Term Memory (LSTM) networks, are widely used in Automatic Speech Recognition for language modelling during decoding, usually as a mechanism for rescoring hypothesis. This paper proposes a new architecture to perform real-time one-pass decoding using LSTM language models. To make decoding efficient, the estimation of look-ahead scores was accelerated by precomputing static look-ahead tables. These static tables were precomputed from a pruned n-gram model, reducing drastically the computational cost during decoding. Additionally, the LSTM language model evaluation was efficiently performed using Variance Regularization along with a strategy of lazy evaluation. The proposed one-pass decoder architecture was evaluated on the well-known LibriSpeech and TED-LIUMv3 datasets. Results showed that the proposed algorithm obtains very competitive WERs with ~0.6 RTFs. Finally, our one-pass decoder is compared with a decoupled two-pass decoder.

Loading