Abstract: Knowledge distillation transfers knowledge from the teacher model to the student one, significantly enhancing the capabilities of the student network. However, alleviating the information gap between the corresponding stages of the student and teacher during the distillation process poses a challenge. This challenge is particularly noticeable at deeper levels, where the limited capability of the student may result in capturing less information, thus leading to poor learning performance. To overcome this limitation, we introduce a novel multi-stage local feature distillation method, which leverages fused multiple feature maps named tutor to bridge the gap. Additionally, we have designed the Value Attention-Based Fusion module (Value-ABF) to enhance feature fusion reasonably. Compared with other distillation methods, our approach achieves comparable or even superior results and demonstrates better training efficiency on CIFAR-100 and COCO2017 datasets for tasks such as image classification, object detection, and instance segmentation.
Loading