Teaching what you should teach: a data-based distillation method

Shitong Shao, Zhen huang, Xinxiao Wu

Published: 29 Mar 2023, Last Modified: 05 Mar 2025IJCAI 2023EveryoneRevisionsCC BY 4.0

Abstract: In real teaching scenarios, an excellent teacher al- ways teaches what he (or she) is good at but the student is not. This gives the student the best assistance in making up for his (or her) weak- nesses and becoming a good one overall. Enlight- ened by this, we introduce the “Teaching what you Should Teach” strategy into a knowledge distilla- tion framework, and propose a data-based distil- lation method named “TST” that searches for de- sirable augmented samples to assist in distilling more efficiently and rationally. To be specific, we design a neural network-based data augmentation module with priori bias to find out what meets the teacher’s strengths but the student’s weaknesses, by learning magnitudes and probabilities to generate suitable data samples. By training the data aug- mentation module and the generalized distillation paradigm alternately, a student model is learned with excellent generalization ability. To verify the effectiveness of our method, we conducted exten- sive comparative experiments on object recogni- tion, detection, and segmentation tasks. The results on the CIFAR-100, ImageNet-1k, MS-COCO, and Cityscapes datasets demonstrate that our method achieves state-of-the-art performance on almost all teacher-student pairs. Furthermore, we conduct vi- sualization studies to explore what magnitudes and probabilities are needed for the distillation process.