Variable-activation and variable-input deep neural network for robust speech recognition

Rui Zhao, Jinyu Li, Yifan Gong

Published: 2014, Last Modified: 24 Sept 2023SLT 2014Readers: Everyone

Abstract: In a previous study, we proposed variable-component deep neural network (VCDNN) to improve the robustness of context-dependent deep neural network hidden Markov model (CD-DNN-HMM). We model the components of DNN a set of polynomial functions of environmental variables, more specifically signal-to-noise ratio (SNR). We refined VCDNN on two types of DNN components: (1) weighting matrix and bias (2) the output of each layer. These two methods are called variable-parameter DNN (VPDNN) and variable-output DNN (VODNN). Although both methods got good gain over the standard DNN, they doubled the number of parameters even with only the first-order environment variable. In this study, we propose two new types of VCDNN, namely variable activation DNN (VADNN) and variable input DNN (VIDNN). The environment variable is applied to the hidden layer activation function in VADNN, and is applied directly to the input in VIDNN. Both DNNs only increase a negligible number of parameters compared to the standard DNN. Experimental results on Aurora4 task show that both methods are effective, and VIDNN can beat all other variations of VCDNN with relative 7.69% word error reduction from the standard DNN with the least increase in number of parameters.

0 Replies