Keywords: Confidence Calibration, Temperature Scaling, Logit Margin, Out-of-Distribution Detection, Deep Neural Networks, Deep Learning
TL;DR: We propose SMART, a lightweight post-hoc calibration method, that leverages margin to achieve SOTA calibration performance with minimal parameters and extremely limited validation data across diverse architectures and distribution shifts.
Abstract: Deep neural networks often exhibit overconfidence despite their high accuracy. Such miscalibration limits reliability in safety-critical domains where trustworthiness are crucial. Post-hoc calibration methods offer a practical solution where popular approaches like Temperature Scaling (TS) apply a single corrective parameter to all samples, failing to address the sample-dependent nature of miscalibration. While more advanced methods attempt to adapt to sample difficulty, they often rely on complex and indirectly learned proxies.
In this work, we first identify the logit margin as a direct, simple, and principled indicator of sample hardness. We provide substantial empirical and theoretical evidence that it serves as a more effective indicator of sample hardness than existing proxies. Meanwhile, we identify a fundamental flaw in current methods that optimizing Negative Log-Likelihood (NLL) can paradoxically degrade calibration. To resolve this, we introduce Charbonnier–SoftECE, a novel and theoretically guaranteed objective that directly minimizes calibration error.
Building on these insights, we propose Sample Margin-Aware Recalibration of Temperature (SMART), a lightweight post-hoc method that learns a minimalistic sample-wise mapping from the logit margin to an optimal temperature, guided by our calibration-centric objective. Extensive experiments show state-of-the-art performance for calibration across diverse architectures and datasets with a minimal inference-time data consumption. The code is available at: \url{https://anonymous.4open.science/r/SMART-8B11}.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 5570
Loading