A coded knowledge distillation framework for image classification based on adaptive JPEG encoding

Published: 01 Jan 2025, Last Modified: 31 Mar 2025Pattern Recognition JournalEveryoneRevisionsCC BY 4.0
Abstract: In knowledge distillation (KD), a lightweight student model yields enhanced test accuracy by mimicking the behavior of a pre-trained large model (teacher). However, the cumbersome teacher model often makes over-confident responses, resulting in poor generalization when presented with unseen data. Consequently, a student trained by such a teacher also inherits this problem. To mitigate this issue, in this paper, we present a new framework of KD dubbed coded knowledge distillation (CKD) in which the student is trained to mimic instead the behavior of a coded teacher. Compared to the teacher in KD, the coded teacher in CKD has an additional adaptive encoding layer in the front, which adaptively encodes an input image into a compressed version (using JPEG encoding for instance) and then feeds the compressed input image to the pre-trained teacher. Comprehensive experimental results show the effectiveness of CKD over KD. In addition, we extend the deployment of a coded teacher to other knowledge transfer methods, showcasing its ability to enhance test accuracy across these methods.
Loading