Abstract: In this paper, we propose a phonetic state relation graph regularized Deep Neural Network (DNN) for a robust acoustic model. A DNN-based acoustic model is trained in terms of minimizing a cost function that is usually penalized by regularizations. Regularization generally reflects prior knowledge that plays a role in constraining the model parameter space. In DNN-based acoustic models, various regularizations have been proposed to improve robustness. However, most approaches do not handle speech generation knowledge even if this process is the most fundamental prior. For example, l <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sub> and l <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sub> -norm regularizations are equivalent to set Gaussian prior and Laplacian prior to model parameters respectively. This means that any speech signal specific knowledge is not used for regularization. Manifold-based regularization exploits the local linear structure of observed acoustic features, which are simply realization of the speech generation process. Therefore, to incorporate prior knowledge of speech generation into regularization, we propose a phonetic state relation graph based approach. This method was evaluated on the TIMIT phone recognition domain. The results showed that it reduced the phone error rate from 20.8% to 20.3% under the same conditions.
0 Replies
Loading