DMTD: Dynamic Multi-Temperature Distillation

ICLR 2026 Conference Submission17441 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Knowledge Distillation, Dynamic Multi-Temperature, Logit Distillation
Abstract: Knowledge Distillation serves to compress complex neural networks into simpler architectures. In recent years, logit-based distillation has gained significant attention due to its computational efficiency. However, many logit distillation techniques merely set a single temperature as a fixed parameter, which is only sensitive to global features. In contrast, numerous computer vision tasks require varying degrees of attention to both global and local features simultaneously. This contradiction hinders the effectiveness of logit distillation methods in computer vision applications. To this end, this paper introduces a modular approach known as Dynamic Multi-Temperature Distillation (DMTD), which employs multiple learnable temperatures by adaptively adjusting temperature parameters based on the significance of both global and local features. This method enhances the student model's ability to mimic the hidden behaviors of the teacher during inference. Experimental results demonstrate that DMTD integrates effectively with existing logit distillation methods, leading to significant improvements across various teacher-student pairs in benchmark datasets for image classification and object detection.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 17441
Loading