Abstract: Knowledge Distillation (KD), which focuses on transferring semantic knowledge from a parameter-heavy teacher network to a more compact student network, has been widely and successfully used across plenty of computer vision tasks like image/video recognition and segmentation. Unlike traditional KD methods that primarily focus on the consistency of intermediate features in the spatial domain, we propose a novel Multi-scale Frequency-Driven Knowledge Distillation (MFD-KD) framework, which emphasizes the utilization of information in the frequency domain. Specifically, our method adopts Fast Fourier Transform (FFT) to shift intermediate feature maps of spatial domain into the corresponding frequency domain, enabling our approach to extract crucial high-and low-frequency information both inside and outside the frequency layer's square center, while also minimizing interference from non-semantic information, such as noise. Furthermore, we have developed a logic layer similarity distillation loss to further leverage the category information of the teacher network, thereby enhancing the performance of the student network. Extensive experiments with various architectures including Convolution Neural Network (CNN)-based and Transformer-based networks on public datasets, such as CIFAR100, ImageNet, and MSCOCO, have shown that our method can significantly improve the performance of these models. The code is available at https://github.com/ronglianghuang/MFD-KD.
Loading