Keywords: neural network, calibration, network calibration, cooling, temperature scaling, classification
TL;DR: The paper proposes a calibration method applied during neural network training, which removes the need for a learning rate schedule.
Abstract: Modern classification neural networks are notoriously prone to being overly confident in their predictions. With multiple calibration methods having been proposed so far, there has been noteworthy progress in reducing this overconfidence. However, to the best of our knowledge, prior methods have exclusively focused on the factors that affect calibration, leaving open the reverse question of how (mis)calibration impacts network training. Aiming for a better understanding of this interplay, we propose a temperature-based Cooling method for calibrating classification neural networks during training. Cooling has a substantial effect on the gradients and reduces the need for a learning rate schedule. We investigate different variants of Cooling, with the simplest one, last layer Cooling, being also the best-performant one, improving network performance on a range of datasets, network architectures, and hyperparameter settings.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Deep Learning and representational learning
Supplementary Material: zip
8 Replies
Loading