PNP-RKD: A Positive-Negative Pair based Relational Knowledge Distillation Method for Cross-Domain Speaker Verification

Qing Gu, Yan Song, Nan Jiang, Pengfei Cai, Ian McLoughlin

Published: 01 Jan 2025, Last Modified: 08 Jul 2025ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Existing deep embedding learning based speaker verification (SV) methods suffer from performance degradation under domain shift conditions. This can be alleviated through unsupervised domain adaptation (UDA) techniques. While UDA improves global statistical consistency across domains, discriminative information may be overlooked or misaligned in the process. To combat this, we propose PNP-RKD, a relational knowledge distillation method that utilizes positive and negative pairs from both the source and target domains within a multitask learning framework. Two auxiliary tasks are conducted separately in the source and target domains to support PNP-RKD. Embeddings are learned in a supervised fashion from the labeled source domain, providing a robust foundation of prior knowledge. For the unlabeled target domain, we apply contrastive learning based on swapped prediction, a key component that enhances noise robustness and improves the quality of learned prototypes. More importantly, it facilitates reliable sampling in PNP-RKD, thereby enhancing the alignment of discriminative knowledge across domains. Extensive experiments conducted on the NIST SRE16 and SRE18 datasets demonstrate the superior performance of the proposed PNP-RKD method, achieving EERs of 6.83% and 8.28%, respectively.