Generalized Score Comparison-Based Learning Objective for Deep Speaker Embedding

Min Hyun Han, Sung Hwan Mun, Nam Soo Kim

Published: 01 Jan 2025, Last Modified: 09 Jan 2026IEEE AccessEveryoneRevisionsCC BY-SA 4.0

Abstract: In state-of-the-art speaker verification systems, speaker embeddings are trained to be closer to the target speaker prototype, which is either obtained from the other speech samples or constructed with trainable parameters. This can be considered a classification task since the network is trying to learn the features that are most relevant to the corresponding speaker from the input speech. Although classification-based learning demonstrates the ability to extract speaker-related information, it does not guarantee optimal speaker verification performance. In this paper, we propose a score comparison-based learning objective, which guides the training framework to be more consistent with the verification task, enforcing the embedding space to have lower intra-class variance compared to inter-class variance in terms of similarity scores. Furthermore, we propose a generalized loss function for score comparison-based learning, encompassing many conventional training losses and regularization techniques. The proposed technique is compared with the conventional methods using the VoxCeleb, VOiCES, CN-Celeb, and Common Voice datasets. Experimental results demonstrate that the proposed method can boost the performance and make the system more robust to over-fitting in speaker verification tasks.

External IDs:doi:10.1109/access.2025.3552790