Combating Noisy Labels in Knowledge Distillation for Efficient Edge Device Deployment

Mengmeng Sheng, Shuai Yan, Zeren Sun, Tao Chen, Huafeng Liu, Yazhou Yao

Published: 01 Jan 2025, Last Modified: 14 Oct 2025CrossrefEveryoneRevisionsCC BY-SA 4.0

Abstract: Knowledge distillation (KD) is an effective technique for transferring information from large teacher models to lightweight student models, enabling deployment on edge devices. However, noisy labels are inevitably generated by teacher models in KD and are inherent in large-scale datasets, negatively impacting both student and teacher models. To tackle these issues, we propose a novel and efficient approach called AL2NL (Adaptively Learn to Learn for combating Noisy Labels). Specifically, we propose a joint bootstrapping loss (JoBS) by integrating the traditional bootstrapping loss with a label-independent regularization term to fully utilize all training data. JoBS consists of three components: a classification loss that targets clean samples, a correction loss designed for accurately corrected noisy samples, and a regularization loss for those noisy samples with unreliable corrections. This framework empowers models to iteratively enhance themselves using their predictions and contrastive representations. Furthermore, we introduce an adaptive meta-learning sample re-weighting mechanism that dynamically adjusts the importance of each JoBS component. To alleviate dependence on prior knowledge, we present a dynamic sample mining strategy aimed at estimating a class-balanced clean subset, specifically designed for our meta-learning process. Comprehensive evaluations on both synthetic and real-world noisy benchmark datasets verify the effectiveness of our AL2NL.

External IDs:doi:10.1109/tce.2025.3610163