Temperature-driven category decoupled knowledge distillation with interpretability for model compression

Ying Chen

Published: 31 Dec 2025, Last Modified: 29 Jan 2026OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: In practical engineering applications, knowledge distillation (KD) can effectively compress large models, which allows them to be deployed on edge devices with limited computational resources. By exploring the challenges of training class in KD, a temperature-driven category decoupled ( ) knowledge distillation with interpretability is proposed. Different from the original decoupled KD which decouples the classical logits into target class and non-target class, investigates the variation tendency of different classes with the KD temperature, and re-formulate the logits into similar class and dissimilar class. A rigorous mathematical analysis is presented for interpretability, theoretically demonstrating the rationality of the category decoupling using temperature stimuli. Then, the classical logits distillation can be decoupled into: similar classes KD and dissimilar classes KD, to address the problem of class confusion. Furthermore, the proposed method can be plugged into State-of-the-art (SOTA) distillation approaches to further enhance model performance. Experiments demonstrate that the proposed method achieves an average classification accuracy improvement of 1.26% and a maximum improvement of 3.51% over the baselines (without TDCD embedded) on multi-class classification tasks. Additionally, TDCD attains the highest average precision on the MS-COCO dataset for object detection and achieves the best classification accuracy on the RTE dataset for the textual entailment task.