Abstract: Self-supervised learning has shown outstanding performance on speaker verification, and the 2-stage frameworks have more comprehensive training schemes, which typically exhibit better performance. They utilize clustering to obtain pseudo-labels, which are then used as the supervision signal in stage 2. However, these pseudo-labels often contain a significant amount of noisy labels, severely impacting speaker verification performance. In this paper, we propose a dynamic self-supervised pseudo-label correction method based on batch-scale training. By filtering and correcting samples based on the loss and prediction distribution, our method better aligns with the dynamic training process and achieves EER(%) of 1.33, 1.56 and 2.78 on the test sets of Voxceleb-O, E, H.
External IDs:dblp:conf/icassp/WangF025a
Loading