Transferring Knowledge to Smaller Network with Class-Distance LossDownload PDF

24 Nov 2024 (modified: 14 Mar 2017)ICLR 2017Readers: Everyone
Abstract: Training a network with small capacity that can perform as well as a larger capacity network is an important problem that needs to be solved in real life applications which require fast inference time and small memory requirement. Previous approaches that transfer knowledge from a bigger network to a smaller network show little benefit when applied to state-of-the-art convolutional neural network architectures such as Residual Network trained with batch normalization. We propose class-distance loss that helps teacher networks to form densely clustered vector space to make it easy for the student network to learn from it. We show that a small network with half the size of the original network trained with the proposed strategy can perform close to the original network on CIFAR-10 dataset.
Conflicts: lunit.io
6 Replies

Loading