Abstract: Knowledge distillation, as a type of model compression algorithms, has been popularly adopted due to its easy implementation and effectiveness. However, transferring knowledge from a teacher network to a student one encounters a bottleneck. Specifically, for the same student network, its performance improvement could remain limited, even if a superior teacher network is used to provide guidance. The above situation indicates an incomplete understanding of the transmitted knowledge. To address the above-mentioned issue, we propose a distillation approach, which is referred to as Knowledge Distillation with Classmate (KDC). This approach introduces an untrained classmate network alongside traditional knowledge distillation, enabling collaborative learning between the student network and its classmate. Through this collaborative learning mechanism, the student network can gain a better understanding of the dark knowledge conveyed by the teacher network, leading to a performance improvement. Compared to traditional knowledge distillation, our approach achieves better performance on the CIFAR-100 dataset. Additionally, since the distillation strategy based on curriculum learning can effectively improve the performance of a student network, we combine KDC with curriculum learning strategy. The experimental results indicate that the combination can result in a further improvement of the performance of the student network.
Loading