Leave No Knowledge Behind During Knowledge Distillation: Towards Practical and Effective Knowledge Distillation For Code-Switching ASR Using Realistic Data

Published: 01 Jan 2024, Last Modified: 09 Oct 2025SLT 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Recent advances in automatic speech recognition (ASR) often rely on large speech foundation models for generating high-quality transcriptions. However, these models can be impractical due to limited computing resources. The situation is even more severe in terms of more realistic or difficult scenarios, such as code-switching ASR (CS-ASR). To address this, we present a framework for developing more efficient models for CS-ASR through knowledge distillation using realistic speech-only data. Our proposed method, Leave No Knowledge Behind During Knowledge Distillation ($\mathrm{K}^{2} \mathrm{D}$), leverages both the teacher model’s knowledge and additional insights from a small auxiliary model. We evaluate our approach on two in-domain and two out-domain datasets, demonstrating that $\mathrm{K}^{2} \mathrm{D}$ is effective. By conducting $\mathrm{K}^{2} \mathrm{D}$ on the unlabeled realistic data, we have successfully obtained a 2 -time smaller model with 5 -time faster generation speed while outperforming the baseline methods and the teacher model on all the testing sets.
Loading