Beyond Overconfidence: Rethinking Calibration in Large-Scale Vision Models

Beyond Overconfidence: Rethinking Calibration in Large-Scale Vision Models

ICLR 2026 Conference Submission18027 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: model calibration, post-hoc calibration, calibration benchmark

Abstract: Reliable uncertainty calibration is crucial for safe deployment of deep neural networks in high-stakes settings. While these networks are known to exhibit systematic overconfidence, particularly under distribution shifts, the calibration of large-scale vision models, such as ConvNeXt, EVA, and BEiT, remains underexplored. We comprehensively examine their calibration behavior, uncovering findings that challenge well-established assumptions. We find that these models are underconfident on in-distribution data, resulting in increased calibration error, but exhibit improved calibration under distribution shifts. This phenomenon is primarily driven by modern training techniques, including massive pretraining and sophisticated regularization and augmentation methods, rather than architectural innovations alone. We also demonstrate that these large-scale models are highly responsive to post-hoc calibration techniques in the in-distribution setting, enabling practitioners to mitigate underconfidence bias effectively. However, these methods become progressively less reliable under severe distribution shifts and can occasionally produce counterproductive results. Our findings highlight the complex, non-monotonic effects of architectural and training innovations on calibration, challenging established narratives of continuous improvement.

Supplementary Material: zip

Primary Area: datasets and benchmarks

Submission Number: 18027

Loading