Optimal and Provable Calibration in High-Dimensional Binary Classification: Angular Calibration and Platt Scaling
Keywords: High dimensional statistics
TL;DR: High dimensional theory for calibration of binary classification
Abstract: We study the fundamental problem of calibrating a linear binary classifier of the form \(\sigma(\hat{w}^\top x)\), where the feature vector \(x\) is Gaussian, \(\sigma\) is a link function, and \(\hat{w}\) is an estimator of the true linear weight $w^\star$. By interpolating with a noninformative \emph{chance classifier}, we construct a well-calibrated predictor whose interpolation weight depends on the angle \(\angle(\hat{w}, w_\star)\) between the estimator \(\hat{w}\) and the true linear weight \(w_\star\). We establish that this angular calibration approach is provably well-calibrated in a high-dimensional regime where the number of samples and features both diverge, at a comparable rate. The angle \(\angle(\hat{w}, w_\star)\) can be consistently estimated. Furthermore, the resulting predictor is uniquely \emph{Bregman-optimal}, minimizing the Bregman divergence to the true label distribution within a suitable class of calibrated predictors.
Our work is the first to provide a calibration strategy that satisfies both calibration and optimality properties provably in high dimensions. Additionally, we identify conditions under which a classical Platt-scaling predictor converges to our Bregman-optimal calibrated solution. Thus, Platt-scaling also inherits these desirable properties provably in high dimensions.
Supplementary Material: zip
Primary Area: Theory (e.g., control theory, learning theory, algorithmic game theory)
Submission Number: 17263
Loading