Abstract: Knowledge distillation is effective to train small and generalisable network models for meeting the low-memory and fast running requirements. Existing offline
distillation methods rely on a strong pre-trained teacher, which enables favourable
knowledge discovery and transfer but requires a complex two-phase training procedure. Online counterparts address this limitation at the price of lacking a highcapacity teacher. In this work, we present an On-the-fly Native Ensemble (ONE)
learning strategy for one-stage online distillation. Specifically, ONE trains only a
single multi-branch network while simultaneously establishing a strong teacher onthe-fly to enhance the learning of target network. Extensive evaluations show that
ONE improves the generalisation performance a variety of deep neural networks
more significantly than alternative methods on four image classification dataset:
CIFAR10, CIFAR100, SVHN, and ImageNet, whilst having the computational
efficiency advantages.
0 Replies
Loading