Feature Learning in Deep Neural Networks - A Study on Speech Recognition Tasks

Dong Yu, Mike Seltzer, Jinyu Li, Jui-Ting Huang, Frank Seide

Jan 17, 2013 (modified: Jan 17, 2013) ICLR 2013 conference submission readers: everyone
  • Decision: conferenceOral-iclr2013-conference
  • Abstract: Recent studies have shown that deep neural networks (DNNs) perform significantly better than shallow networks and Gaussian mixture models (GMMs) on large vocabulary speech recognition tasks. In this paper we argue that the difficulty in speech recognition is primarily caused by the high variability in speech signals. DNNs, which can be considered a joint model of a nonlinear feature transform and a log-linear classifier, achieve improved recognition accuracy by extracting discriminative internal representations that are less sensitive to small perturbations in the input features. However, if test samples are very dissimilar to training samples, DNNs perform poorly. We demonstrate these properties empirically using a series of recognition experiments on mixed narrowband and wideband speech and speech distorted by environmental noise.