Optimizing Knowledge Distillation via Shallow Texture Knowledge Transfer

Xinlei Huang, Jialiang Tang, Haifeng Qing, Honglin Zhu, Ning Jiang, Wenqing Wu, Peng Zhang

Published: 2022, Last Modified: 24 Mar 2026ICONIP (4) 2022EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Knowledge distillation (KD) is a widely used model compression technology to train a superior small network named student network. KD promotes a student network to mimic the knowledge from the middle or deep layers of a large network named teacher network. In general, existing knowledge distillation methods neglect to explore the shallow features of neural networks that contain informative texture knowledge. In this paper, we propose Shallow Texture Knowledge Distillation (SeKD) for distilling these informative shallow features. Moreover, we investigate the traditional machine learning method and adopt Gradient Local Binary Pattern (GLBP) for shallow features extraction. However, we have found that using GLBP to process shallow features will introduce an additional computational burden. To reduce computation, we design a texture attention module to optimize shallow feature extraction for distilling. We have conducted extensive experiments to evaluate the effectiveness of our proposed method. When training on the CIFAR-10 and CIFAR-100 datasets, the student network WideResNet16-2 trained by SeKD achieves 94.35% and 75.90% accuracies, respectively.

External IDs:dblp:conf/iconip/HuangTQZJWZ22