Fully logits guided distillation with intermediate decision learning for deep model compression

Ying Chen

Published: 28 Aug 2025, Last Modified: 29 Jan 2026OpenReview Archive Direct UploadEveryoneCC BY 4.0

Abstract: Knowledge distillation, as a model compression technique, has been widely applied in artificial intelligence to improve the efficiency of deep learning models, especially in resource-constrained environments. Considering that logit contains more decision information compared to the intermediate feature maps, fully logits guided distillation is proposed, which allows student networks to have better access to the guidance from both the intermediate and decision levels of the teacher’s network. Intermediate feature logicalization is designed, which perform logit transformations on the intermediate feature maps to obtain intermediate decision information. A logit matrisation strategy is proposed, which aim to capture inter-class information of the logits. Furthermore, cross layer distillation is presented in order to enable the final logit of the teacher to provide guidance to the intermediate layers of the student. The proposed mechanism can be embedded into State-of-the-art distillation frameworks to further improve the accuracy. Experiments conducted on the CIFAR-10, CIFAR-100, and ImageNet datasets demonstrate the effectiveness of the proposed method. Image classification accuracy was used as the evaluation metric, and the results show that the proposed method improves accuracy by an average of 1.84%, with a best improvement of 3.19% over the baseline models.