Abstract: Recent online knowledge distillation (OKD) methods focus on capturing rich and useful intermediate information by performing multi-layer feature learning. Existing works only consider intermediate layer feature maps between the same layers and ignore valuable information across layers, which results in the lack of appropriate cross-layer supervision in detail and the process of learning. Besides, this manner provides insufficient supervision information to supervise the learning of student, since it fails to construct a qualified teacher. In this work, we propose a Deep Cross-layer Collaborative Learning network (DCCL) for OKD, which efficiently exploits fruitful knowledge of peer student models by keeping appropriate intermediate cross-layer supervision. Specifically, each student gradually integrates its own features at different layers for feature matching, so as to effectively utilize features in low and high levels for learning more composite knowledge. Moreover, we assign a collaborative knowledge learning strategy, in which a qualified teacher is established via fusing the features of last convolution layers for enhancing high-level representation. In this way, all student models continuously transfer the rich teacher’s internal representation as well as capture its dynamic growth process, and in turn assist the learning of the fusion teacher to further supervise students. In the experiments, our proposed DCCL has shown great generalization ability with various backbone models on CIFAR-100, Tiny ImageNet and ImageNet, and also demonstrated superior performance against mainstream OKD works. Our code is available here: <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/nanxiaotong/DCCL</uri> .
0 Replies
Loading