Abstract: Knowledge distillation (KD) is a widely applicable DNN (Deep Neural Network) compression technology, which aims to transfer knowledge from a pretrained teacher neural network to a target student neural network. In practice, an enormous teacher is extracted through the compression of a neural network to train a relatively compact student. In general, current KD approaches mostly minimize divergence between the intermediate layers or logits of the teacher network and student network. However, these methods ignore important features distribution in the teacher network space, which leads to the defect of current KD approaches in the fine-grained categorization task, e.g., metric learning. For this, we propose a novel approach that transfers features distribution in the hyperspherical space from the teacher network to the student network. Specifically, our approach facilitates the student to learn the distribution among samples in the teacher and reduces the intra-class variance. Extensive experimental evaluations on three well-known metric learning datasets show that our method can distill higher-level knowledge from the teacher network and achieve state-of-the-art performance.
External IDs:dblp:conf/cikm/LiuZM22
Loading