Knowledge Distillation From Ensemble for Spoken Language Identification

Published: 01 Jan 2025, Last Modified: 13 Oct 2025ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Spoken language identification (LID) has seen substantial performance gains with the rise of large-scale models. However, these models are often computationally expensive and impractical for many real-world applications. In this work, we propose a novel knowledge distillation from ensemble framework to address this challenge. By distilling an ensemble of large LID models into a single, more efficient student, we achieve comparable or even superior performance while reducing computational cost by 67%. Our approach yields a student model with less than 10% the size of a 200M+ parameter teacher ensemble, yet outperforming a 140M parameter teacher by 13% relative. Additionally, combining our distillation technique with decoupled knowledge distillation leads to substantial gains (50% relative), especially for confusable and low-resource languages in the FLEURS dataset.
Loading