PAC-Bayes Analysis for Recalibration in Classification

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We present a PAC-Bayes analysis for calibration error and propose a new generalization-aware recalibration algorithm.
Abstract: Nonparametric estimation using uniform-width binning is a standard approach for evaluating the calibration performance of machine learning models. However, existing theoretical analyses of the bias induced by binning are limited to binary classification, creating a significant gap with practical applications such as multiclass classification. Additionally, many parametric recalibration algorithms lack theoretical guarantees for their generalization performance. To address these issues, we conduct a generalization analysis of calibration error using the probably approximately correct Bayes framework. This approach enables us to derive the first optimizable upper bound for generalization error in the calibration context. On the basis of our theory, we propose a generalization-aware recalibration algorithm. Numerical experiments show that our algorithm enhances the performance of Gaussian process-based recalibration across various benchmark datasets and models.
Lay Summary: When machine learning models make predictions, it’s important to understand how confident they are—and how well that confidence matches reality. This is known as calibration. A common way to evaluate calibration is by dividing predictions into bins, but current theoretical understanding of the errors introduced by this method is limited mostly to simple binary classification problems. Real-world tasks, however, often involve many classes and more complex models. To bridge this gap, we provide a new theoretical framework to analyze how well calibration metrics generalize beyond the training data, even in multiclass settings. Using tools from PAC-Bayes theory, we derive the first generalization bounds that can be optimized directly during recalibration. Building on this theory, we design a new recalibration algorithm that explicitly accounts for generalization performance. Experiments across a variety of datasets and models show that our approach improves calibration quality compared to existing methods. Our work offers both new theoretical insights and practical tools for building more reliable AI systems.
Primary Area: Theory->Probabilistic Methods
Keywords: Calibration, ECE, expected calibration error, PAC-Bayes
Flagged For Ethics Review: true
Submission Number: 6202
Loading