A robust DBN-vector based speaker verification system under channel mismatch conditions

Disong Wang, Yuexian Zou, Junhong Liu, Yichi Huang

Published: 01 Jan 2016, Last Modified: 24 Apr 2025DSP 2016EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Channel variability is one of the largest challenges for speaker verification (SV) techniques. Techniques in the feature, model and score domains have been applied to mitigate the channel impact. In this paper, we strive to study on robust deep feature learning with the deep belief network (DBN) by using traditional spectral features such as MFCC or PLP. In detail, during the training phase, a DBN is trained to map spectral features to the corresponding speaker identity, then deep features extracted at k th hidden layers are selected where k is determined by maximizing the ratio between within-class distance and between-class distance. In the enrollment phase, the well-trained DBN is used to extract deep features at k th hidden layers, then k th -DBN-vector is formed by averaging these features. In the test phase, k th -DBN-vector is extracted for test utterance and compared to the enrolled k th -DBN-vector to make a verification decision. To validate the effectiveness of the learned DBN-vectors for speaker verification, extensive experiments have been purposely conducted on Mandarin corpuses. It is encouraged to see that our proposed DBN-vector based SV system is superior to the state-of-the-art i-vector based SV system under channel mismatch conditions in terms of equal error rate (EER) and minimum detection cost function (minDCF).