Threshold Calibration: Making All Large Predicted Probabilities Trustworthy

Published: 13 Apr 2026, Last Modified: 13 Apr 2026Calibration for Modern AI @ AISTATS 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Calibration
Abstract: Modern classifiers increasingly output full predictive distributions (class probabilities). In downstream decision-making, the \emph{largest} predicted probabilities often trigger actions (e.g., medical follow-up when risk exceeds a threshold), so failures of calibration at the high end are particularly costly. At the same time, \emph{full} multiclass calibration becomes statistically intractable for many classes. Motivated by these considerations, we propose \emph{threshold calibration}: calibration of \emph{all} predicted class probabilities above a fixed threshold. We define a corresponding miscalibration functional, give a partition-based debiased estimator with linear-time complexity, and prove a distribution-free consistency result. On the \textsc{Covtype} benchmark, we empirically evaluate threshold calibration across several predictors and post-hoc recalibrators, finding that more expressive recalibrators can substantially reduce threshold miscalibration.
Submission Number: 25
Loading