Improving Multi-Class Calibration through Normalization-Aware Isotonic Techniques

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We propose novel isotonic normalization-aware post-hoc calibration techniques that inherently account for probability normalization, addressing a key limitation of prior approaches.
Abstract: Accurate and reliable probability predictions are essential for multi-class supervised learning tasks, where well-calibrated models enable rational decision-making. While isotonic regression has proven effective for binary calibration, its extension to multi-class problems via one-vs-rest calibration often produces suboptimal results, limiting its practical adoption. In this work, we propose novel isotonic normalization-aware techniques for multi-class calibration, grounded in natural and intuitive assumptions expected by practitioners. Unlike prior approaches, our methods inherently account for probability normalization by either incorporating normalization directly into the optimization process (**NA-FIR**) or modeling the problem as a cumulative bivariate isotonic regression (**SCIR**). Empirical evaluations on a variety of text and image classification datasets across different model architectures reveal that our approach consistently improves log loss and expected calibration error (ECE) metrics. These findings underscore the potential of our approach to enhance a-parametric multi-class calibration practices, offering an adaptable solution for real-world applications.
Lay Summary: Modern machine learning systems often predict probabilities - for example, how likely an image contains a dog, a cat, or a car. But these probabilities can be unreliable, especially when a model must choose between many possible classes. This can lead to poor decisions in high-stakes applications. A common way to fix this is by calibrating the model - adjusting its probabilities so they better reflect reality. One popular approach is isotonic regression, which fits a simple model to fix the probabilities under just one assumption: that higher scores from the model should correspond to higher actual chances. This works well for binary decisions, but breaks down when applied to multi-class settings. We introduce two new methods (NA-FIR and SCIR) that directly account for the multi-class nature of the problem during training, while remaining relatively simple. We tested our methods on various text and image classification tasks and found that they consistently improved key performance metrics. These techniques offer a simple and effective upgrade for practitioners seeking better-calibrated AI systems — without needing to change their model or training pipeline.
Primary Area: General Machine Learning->Supervised Learning
Keywords: Machine Learning, Calibration, Multi-class, Isotonic Regression, Uncertainty Estimation
Submission Number: 12156
Loading