Dynamic Ensemble Teacher-Student Distillation Framework for Light-Weight Fake Audio Detection

Jun Xue; Cunhang Fan; Jiangyan Yi; Jian Zhou; Zhao Lv

Dynamic Ensemble Teacher-Student Distillation Framework for Light-Weight Fake Audio Detection

Jun Xue, Cunhang Fan, Jiangyan Yi, Jian Zhou, Zhao Lv

Published: 01 Jan 2024, Last Modified: 28 Sept 2024IEEE Signal Process. Lett. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: In recent years, fake audio detection (FAD) has made great progress, and lightweight is important to achieve fast and reliable audio authenticity verification on resource-limited devices. However, most of the researchers ignore lightweight when improving the performance of FAD. To develop the application of FAD for small-end devices, this paper proposes a novel light-weight network named Light-ECA2Net. Given that networks with different depths have different abilities in capturing fake speech artifacts, this paper proposes a dynamic ensemble teacher-student distillation framework to fully transfer distillation knowledge. The dynamic ensemble distillation is divided into two aspects. First, we adopt one-to-one feature mapping to perceive the multidimensional feature knowledge and dynamically adjust every dimension feature weight by using ground truth labels, which can enable students to receive feature knowledge efficiently. Secondly, different network layers also have their strengths of predicting, further dynamically predicting weight can improve the learning ability of the student. Experimental results on the ASVspoof 2019 LA and PA datasets show that compared to the baseline, our system further improves performance by reducing the model complexity by 45%.

Loading