Calibration Bottleneck: What Makes Neural Networks less Calibratable?

21 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Uncertainty Calibration, Post-hoc Calibration
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: While modern deep neural networks have achieved remarkable success, they have exhibited a notable deficiency in reliably estimating uncertainty. Many existing studies address the uncertainty calibration problem by incorporating regularization techniques to penalize the overconfident outputs during training. In this study, we shift the focus from the miscalibration encountered in the training phase to an investigation of the concept of calibratability, assessing how amenable a model is to be recalibrated in post-training phase. We find that the use of regularization techniques might compromise calibratability, subsequently leading to a decline in final calibration performance after recalibration. To identify the underlying causes leading to poor calibratability, we delve into the calibration of intermediate features across neural networks’ hidden layers. Our study demonstrates that the overtraining of the top layers in neural networks poses a significant obstacle to calibration, while these layers typically offer minimal improvement to the discriminability of features. Based on this observation, we introduce a weak classifier hypothesis: Given a weak classification head, the bottom layers of a neural network can be learned better for producing calibratable features. Consequently, we propose a progressively layer-peeled training (PLT) method to exploit this hypothesis, thereby enhancing model calibratability. Comprehensive experiments show the effectiveness of our method, which improves model calibration and also yields competitive predictive performance.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3477
Loading