Noise Robust Distillation of Self-Supervised Speech Models via Correlation Metrics

Fabian Ritter Gutierrez, Kuan-Po Huang, Dianwen Ng, Jeremy H. M. Wong, Hung-Yi Lee, Eng Siong Chng, Nancy F. Chen

Published: 2024, Last Modified: 07 Oct 2025ICASSP Workshops 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Compared to large speech foundation models, small student models exhibit degraded noise robustness. The student’s robustness can be improved by introducing noise at the inputs during pre-training. Despite this, using the standard distillation loss still yields a student with degraded performance. Thus, this paper proposes improving student robustness via distillation with correlation metrics. Teacher behavior is learned by maximizing the teacher and student cross-correlation matrix between their representations towards identity. Noise robustness is encouraged via the student’s self-correlation minimization. The proposed method consistently outperforms the previous approach on Intent Classification, Keyword Spotting, and Automatic Speech Recognition tasks on SUPERB Challenge.