Open-Set Speaker Identification Through Efficient Few-Shot Tuning With Speaker Reciprocal Points and Unknown Samples

Zhiyong Chen, Shuhang Wu, Xinnuo Li, Zhiqi Ai, Shugong Xu

Published: 01 Jan 2025, Last Modified: 13 Nov 2025IEEE Transactions on Audio, Speech and Language ProcessingEveryoneRevisionsCC BY-SA 4.0

Abstract: This paper introduces a novel framework for few-shot open-set speaker identification, aimed at real-world household wake-up and recognition scenarios. To address the limitations of current speaker models and classification methods, our approach combines a pretrained speaker foundation frontend with a few-shot tunable neural network backend. We employ an effective open-set recognition technique called Speaker Reciprocal Points Learning (SpeakerRPL) to enhance discrimination among target speakers while modeling “otherness.” Moreover, we propose SpeakerRPL+, which integrates unknown sample learning via speech-synthesized unknown samples, significantly boosting few-shot open-set speaker identification (OpenSID) performance. We also investigate optimal model-tuning strategies, zero-shot timbre-controllable synthesis methods, and training procedures for SpeakerRPL+, demonstrating its adaptability across various speaker foundation models. Comprehensive evaluations on multiple multi-language, primarily text-dependent speaker recognition datasets confirm the efficacy of our framework in complex household environments, yielding superior few-shot open-set speaker identification performance over several state-of-the-art speaker foundation models.

External IDs:doi:10.1109/taslpro.2025.3587591