Abstract: In real teaching scenarios, an excellent teacher al-
ways teaches what he (or she) is good at but the
student is not. This gives the student the best
assistance in making up for his (or her) weak-
nesses and becoming a good one overall. Enlight-
ened by this, we introduce the “Teaching what you
Should Teach” strategy into a knowledge distilla-
tion framework, and propose a data-based distil-
lation method named “TST” that searches for de-
sirable augmented samples to assist in distilling
more efficiently and rationally. To be specific, we
design a neural network-based data augmentation
module with priori bias to find out what meets the
teacher’s strengths but the student’s weaknesses, by
learning magnitudes and probabilities to generate
suitable data samples. By training the data aug-
mentation module and the generalized distillation
paradigm alternately, a student model is learned
with excellent generalization ability. To verify the
effectiveness of our method, we conducted exten-
sive comparative experiments on object recogni-
tion, detection, and segmentation tasks. The results
on the CIFAR-100, ImageNet-1k, MS-COCO, and
Cityscapes datasets demonstrate that our method
achieves state-of-the-art performance on almost all
teacher-student pairs. Furthermore, we conduct vi-
sualization studies to explore what magnitudes and
probabilities are needed for the distillation process.
Loading