Knowledge Distillation with Ensemble Calibration

Ishan Mishra, Riyanshu Jain, Dhruv Viradiya, Divyam Patel, Deepak Mishra

Published: 17 Dec 2023, Last Modified: 08 Mar 2025ICVGIP 2023EveryoneCC BY 4.0

Abstract: Knowledge Distillation is a transfer learning and compression technique that aims to transfer hidden knowledge from a teacher model to a student model. However, this transfer often leads to poor calibration in the student model. This can be problematic for high-risk applications that require well-calibrated models to capture prediction uncertainty. To address this issue, we propose a simple and novel technique that enhances the calibration of the student network by using an ensemble of well-calibrated teacher models. We train multiple teacher models using various data-augmentation techniques such as cutout, mixup, CutMix, and AugMix and use their ensemble for knowledge distillation. We evaluate our approach on different teacher-student combinations using CIFAR-10 and CIFAR-100 datasets. Our results demonstrate that our technique improves calibration metrics (such as expected calibration and overconfidence errors) while also increasing the accuracy of the student network.