Abstract: Knowledge Distillation is a transfer learning and compression technique that aims to transfer hidden knowledge from a teacher model
to a student model. However, this transfer often leads to poor calibration in the student model. This can be problematic for high-risk
applications that require well-calibrated models to capture prediction uncertainty. To address this issue, we propose a simple and
novel technique that enhances the calibration of the student network by using an ensemble of well-calibrated teacher models. We
train multiple teacher models using various data-augmentation
techniques such as cutout, mixup, CutMix, and AugMix and use
their ensemble for knowledge distillation. We evaluate our approach
on different teacher-student combinations using CIFAR-10 and
CIFAR-100 datasets. Our results demonstrate that our technique
improves calibration metrics (such as expected calibration and overconfidence errors) while also increasing the accuracy of the student
network.
Loading