Adversarial Multi-task Deep Features and Unsupervised Back-end Adaptation for Language Recognition

Zhiyuan Peng, Siyuan Feng, Tan Lee

Published: 01 Jan 2019, Last Modified: 12 May 2023ICASSP 2019Readers: Everyone

Abstract: This paper presents an investigation into speaker-invariant feature learning and domain adaptation for language recognition (LR) with short utterances. While following the conventional design of i-vector front-end and probabilistic linear discriminant analysis (PLDA) back-end, we propose to apply speaker adversarial multi-task learning (AMTL) to aim explicitly at learning speaker-invariant multilingual bottleneck features and perform unsupervised PLDA adaptation to alleviate performance degradation caused by domain mismatch between training and test data. Through a demo experiment, we show the adverse effect of domain mismatch and motivate the necessity of domain adaptation. LR experiments are carried out with the AP17-OLR challenge dataset to evaluate the effectiveness of the proposed methods in comparison with the state of the art. The results show that both speaker AMTL and unsupervised PLDA adaptation contribute significantly to performance improvement on the short-duration LR task. The effectiveness of PLDA adaptation is found to be insensitive to the number of clusters assumed in unsupervised data labeling. Our best system outperforms the state-of-the-art system of AP17-OLR and shows relative improvements of 6.98% in terms of C <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">avg</sub> and 4.80% in terms of EER on 1-second test set.

0 Replies