Abstract: This paper presents an investigation into speaker-invariant feature learning and domain adaptation for language recognition (LR) with short utterances. While following the conventional design of i-vector front-end and probabilistic linear discriminant analysis (PLDA) back-end, we propose to apply speaker adversarial multi-task learning (AMTL) to aim explicitly at learning speaker-invariant multilingual bottleneck features and perform unsupervised PLDA adaptation to alleviate performance degradation caused by domain mismatch between training and test data. Through a demo experiment, we show the adverse effect of domain mismatch and motivate the necessity of domain adaptation. LR experiments are carried out with the AP17-OLR challenge dataset to evaluate the effectiveness of the proposed methods in comparison with the state of the art. The results show that both speaker AMTL and unsupervised PLDA adaptation contribute significantly to performance improvement on the short-duration LR task. The effectiveness of PLDA adaptation is found to be insensitive to the number of clusters assumed in unsupervised data labeling. Our best system outperforms the state-of-the-art system of AP17-OLR and shows relative improvements of 6.98% in terms of C http://www.w3.org/1998/Math/MathML" xmlns:xlink="avg" target="_blank" rel="nofollow">http://www.w3.org/1999/xlink">avg and 4.80% in terms of EER on 1-second test set.
0 Replies
Loading