Wake-up-word spotting using end-to-end deep neural network system

Published: 01 Jan 2016, Last Modified: 09 Apr 2025ICPR 2016EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Deep neural networks (DNNs) have tremendously improved the performance of automatic speech recognition (ASR). On the other hand, end-to-end speech recognition system can achieve state-of-the-art performance using Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) and Connectionist Temporal Classification (CTC) method for unsegmented sequence data. In this paper, we therefor propose a lightweight wake-up-word (WUW) spotting system based on end-to-end DNN architecture, which is intended to provide a great balance between decoding speed, accuracy and model size. The objective is to introduce CTC framework on spotting process, and to enhance the system by WUW-oriented model training and refinement steps. We test the performance of the proposed architecture on a conversational telephone dataset which illustrate that the computation time can be significantly reduced without a significant decrease in the spotting accuracy.
Loading