Abstract: Feature distillation always leads to significant performance improvements, but requires extra training budgets. To address the problem, we propose TFD, a simple and effective Teacher-Free Distillation framework, which seeks to reuse the privileged features within the student network itself. Specifically, TFD squeezes feature knowledge in the deeper layers into the shallow ones by minimizing feature loss. Thanks to the narrow gap of these self-features, TFD only needs to adopt a simple l <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</inf> loss without complex transformations. Extensive experiments on recognition benchmarks show that our framework can achieve superior performance than teacher-based feature distillation methods. On the ImageNet dataset, our approach achieves 0.8% gains for ResNet18, which surpasses other state-of-the-art training techniques.
0 Replies
Loading