Dy-KD: Dynamic Knowledge Distillation for Reduced Easy Examples

Cheng Lin

Published: 31 Oct 2023, Last Modified: 27 Sept 2024ICONIPEveryoneCC BY 4.0

Abstract: Knowledge distillation is usually performed by promoting a small model (student) to mimic the knowledge of a large model (teacher). The current knowledge distillation methods mainly focus on the extraction and transformation of knowledge while ignoring the importance of examples in the dataset and assigning equal weight to each example. Therefore, in this paper, we propose Dynamic Knowledge Distillation (Dy-KD). To alleviate this problem, Dy-KD incorporates a curriculum strategy to selectively discard easy examples during knowledge distillation. Specifically, we estimate the difficulty level of examples by the predictions from the superior teacher network and divide examples in a dataset into easy examples and hard examples. Subsequently, these examples are given various weights to adjust their contributions to the knowledge transfer. We validate our Dy-KD on CIFAR-100 and Tiny-ImageNet;