Stochastic DNN-HMM Training for Robust ASR

Kang Hyun Lee, Woo Hyun Kang, Hyeon Seung Lee, Nam Soo Kim

2018 (modified: 09 Jan 2026)APSIPA 2018Readers: Everyone

Abstract: Since the introduction of deep neural network (DNN)-based acoustic model to automatic speech recognition (ASR), robust ASR using DNN are being in research. However, most DNN-based techniques are performed without consideration of the reliability of the estimates and this degrades the ASR performance especially in the training-test mismatch conditions. In this paper, we propose a novel deep learning-based acoustic modeling technique which measures and takes account of the reliability using a single DNN. The proposed approach describes the mapping between the noisy input and clean features as a stochastic process. Therefore, a statistical modeling is applied to the DNN-based acoustic model in predicting the posterior distribution of the clean speech features given a distorted input data. Also, by attempting the two different probabilistic models in clean feature distribution assumption, we investigate which distribution is more proper on various environment conditions. It has been shown that the proposed technique outperforms the conventional DNN-based techniques on Aurora-4 DB and mismatched noise conditions.

0 Replies