Abstract: We propose a simple yet effective approach, called joint guided learning (JGL), based on knowledge distillation for image classification. Knowledge distillation transfers knowledge from a teacher model to a student model. The teacher model utilizes both correct and incorrect predictive labels to guide the optimization of the student model. However, the incorrect predictive labels have a negative influence on the student model. Moreover, it is not enough to rely solely on predictive labels to transfer knowledge. The student model should make full use of the features of the teacher model. To mitigate these issues, we design a JGL approach that exploits the joint guidance of features and the predictive labels. Our method involves two novel components: 1) a guided label refinery module that considers the correct predictions and ignores the incorrect predictions and 2) a channel distillation (CD) module that guides the student model to learn the attention map of each channel from feature maps in the teacher model. Experimental results show that our approach can achieve 82.18%, 66.08%, 88.36%, and 90.14% accuracy on CIFAR-100, TinyImageNet, CUB-200-2011, and Stanford Dogs, respectively. In addition, the proposed method consistently outperforms the state-of-the-art approaches on four image classification datasets. Ablation studies further show the contributions of different components in our method.
0 Replies
Loading