Keywords: Calibration
Abstract: Modern classifiers increasingly output full predictive distributions (class probabilities).
In downstream decision-making, the \emph{largest} predicted probabilities often trigger actions (e.g., medical follow-up when risk exceeds a threshold), so failures of calibration at the high end are particularly costly.
At the same time, \emph{full} multiclass calibration becomes statistically intractable for many classes.
Motivated by these considerations,
we propose \emph{threshold calibration}: calibration of \emph{all} predicted class probabilities above a fixed threshold.
We define a corresponding miscalibration functional,
give a partition-based debiased estimator with linear-time complexity, and prove a distribution-free consistency result.
On the \textsc{Covtype} benchmark, we empirically evaluate threshold calibration across several predictors and post-hoc recalibrators,
finding that more expressive recalibrators
can substantially reduce threshold miscalibration.
Submission Number: 25
Loading