Annealing Distillation Algorithm for Transferring Unsupervised Clustering Knowledge to Supervised Student Models
Abstract: In knowledge distillation, the performance of teacher models often serves as an upper limit for student models. For a long time, deeper and more accurate supervised learning algorithms have been the first choice for teacher models in image classification tasks where unsupervised models typically underperform. Therefore, the value of unsupervised teachers for distillation has not been explored. This paper demonstrates an effective path to distill unsupervised teacher clustering knowledge to students. Unlike traditional distillation methods where the teacher model guides the student, we use an annealing strategy to progressively decrease the teacher model’s influence and increase the student model’s own contribution to the distillation loss. Experiments show that, in transferring unsupervised knowledge, the proposed method (AD) improves the student’s accuracy by an average of 9.38% compared to the state-of-the-art. In transferring supervised knowledge, the proposed method performs slightly worse than the state-of-the-art but converges faster during the early epochs.
External IDs:dblp:conf/icassp/GaoGZX25
Loading