Knowledge Distillation for Semantically Inconsistent Data

19 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Knowledge Distillation, Model Compression, Data Importance, Semantic Inconsistency, Image Classification
TL;DR: This paper proposes a generall enhancing techiqque for the existing knowledge distillation methods.
Abstract: Knowledge Distillation (KD) is a widely used technique for model compression, transferring knowledge from a large teacher model to a smaller student model. While most existing KD approaches focus on aligning outputs (logits) or intermediate features between teacher and student models, they typically overlook a key observation: not all training samples contribute equally to the knowledge transfer process. To address this limitation, this paper introduces a novel method to identify the data which exhibits semantic inconsistency between its input space and feature space. By adaptively assigning higher weights to these semantically inconsistent data during student model learning, the proposed method can refine the teacher's knowledge to better align with the student's needs, thereby improve the general knowledge distillation process. To demonstrate the general effectiveness of the proposed method, we embed it into several popular KD frameworks and extensively evaluate it on a diversity of teacher and student architectures. The experimental results prove that the proposed method can significantly boost knowledge distillation tasks, and set new state-of-the-art results on the CIFAR-100, Tiny-ImageNet, and ImageNet datasets. Code is given in the supplementary material.
Supplementary Material: zip
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 15053
Loading