Abstract: Self-supervised learning (SSL) has significantly advanced speaker verification, especially in scenarios with limited labeled data. This paper introduces Energy-based Confidence-Aware Distillation (EBCA-DINO), an SSL enhancement for speaker verification that integrates Energy-Based Models (EBMs) into the DINO (Distillation with No Labels) framework. EBMs use energy scores to assess data complexity and uncertainty, guiding label-free self-distillation. The adaptive temperature scaling tailors the learning process to data characteristics, allowing the teacher model to dynamically adjust the student model’s focus based on sample difficulty. This energy-aware distillation optimizes speaker verification performance. Experimental results demonstrate that EBCA-DINO improves speaker verification with relative performance gains of 4.3%, 4.9%, and 8.7% on the Vox1-O, E, and H test trials, respectively.
External IDs:dblp:conf/icassp/HaoHBFGZ25
Loading