- Abstract: This paper proposes a versatile and powerful training algorithm named Feature-level Ensemble Effect for knowledge Distillation(FEED), which is inspired by the work of factor transfer. The factor transfer is one of the knowledge transfer methods that improves the performance of a student network with a strong teacher network. It transfers the knowledge of a teacher in the feature map level using high-capacity teacher network, and our training algorithm FEED is an extension of it. FEED aims to transfer ensemble knowledge, using either multiple teachers in parallel or multiple training sequences. Adapting the peer-teaching framework, we introduce a couple of training algorithms that transfer ensemble knowledge to the student at the feature map level, both of which help the student network find more generalized solutions in the parameter space. Experimental results on CIFAR-100 and ImageNet show that our method, FEED, has clear performance enhancements,without introducing any additional parameters or computations at test time.
- Keywords: Knowledge Distillation, Ensemble Effect, Knowledge Transfer