Exploring the Knowledge Transferred by Response-Based Teacher-Student Distillation

Published: 01 Jan 2023, Last Modified: 11 Nov 2024ACM Multimedia 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Response-based Knowledge Distillation refers to the technique of supervising the student network with the teacher networks' predictions. The method is motivated by observing that the predicted probabilities reflect the relation among labels, which is the knowledge to be transferred. This paper explores the transferred knowledge from a novel perspective: comparing the knowledge transferred through different teachers. Two intriguing properties are observed. First, higher confidence scores of teachers' predictions lead to better distillation results, and second, teachers' incorrectly predicted training samples should be kept for distillation. We then analyze the phenomenon by studying teachers' decision boundaries, of which some can help the student generalize while some may not. Based on the observations, we further propose an embarrassingly simple distillation framework named Efficient Distillation, which is effective on ImageNet with different teacher-student pairs: When using ResNet34 as the teacher, the student ResNet18 trained from scratch reaches 74.07% Top-1 accuracy within 98 GPU hours (RTX 3090), outperforming current state-of-the-art result (73.19%) by a large margin. Our code is available at https://github.com/lsongx/EffDstl.
Loading