Abstract: Due to great success of deep learning in speech recognition, there has been interest of applying deep learning to speaker verification. Previous investigations usually focus on using deep neural network as new classifiers or to extract speaker dependent features. They are either not compatible with existing speaker verification approaches, or not able to achieve significant performance gain in large scale tasks. Also, all the previous approaches have not addressed the issue of how to make use of extra unsupervised data. This paper proposes a novel feature engineering approach within the deep learning framework for speaker verification. Hidden layer output of deep neural network or deep belief network trained on large amount of speech recognition data are extracted as deep features. These features are then used in a Tandem fashion or concatenated with the original acoustic features for GMM-UBM speaker verification. The proposed approach can make use of large amount of existing speech recognition data without speaker labels and is easy to be combined with other mature classification approaches. Experiments on the core condition of NIST 2006 SRE showed that, in a text independent task, the proposed approach can achieve 12.8% relative EER improvement compared to the standard GMM-UBM systems. In addition, text-dependent speaker verification experiments were also performed and yielded similar significant gain.
0 Replies
Loading