Simple Calibration via Geodesic Kernels

Published: 12 Jun 2025, Last Modified: 12 Jun 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Deep discriminative approaches, such as decision forests and deep neural networks, have recently found applications in many important real-world scenarios. However, deploying these learning algorithms in safety-critical applications raises concerns, particularly when it comes to ensuring calibration for both in-distribution and out-of-distribution regions. Many popular methods for in-distribution (ID) calibration, such as isotonic and Platt’s sigmoidal regression, exhibit adequate ID calibration performance. However, these methods are not calibrated for the entire feature space, leading to overconfidence in the out-of-distribution (OOD) region. Existing OOD calibration methods generally exhibit poor ID calibration. In this paper, we jointly address the ID and OOD problems. We leveraged the fact that deep models learn to partition feature space into a union of polytopes, that is, flat-sided geometric objects. We introduce a geodesic distance to measure the distance between these polytopes and further distinguish samples within the same polytope using a Gaussian kernel. Our experiments on both tabular and vision benchmarks show that the proposed approaches, namely Kernel Density Forest (KDF) and Kernel Density Network (KDN), obtain well-calibrated posteriors for both ID and OOD samples, while mostly preserving the classification accuracy and extrapolating beyond the training data to handle OOD inputs appropriately.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: We sincerely thank TMLR and the action editor for their help and guidance throughout the reviews. In the following we describe how we have improved our paper by incorporating the minor revisions as suggested by the action editor. > “With overparameterized networks, which is the norm, they typically exhibit this property that the number of populated polytopes is equal to the sample size.” Reviewer CZ96 suggested that this point should be clearly stated in the paper. We have added the above line to the FAQ section (Page 17) in response to the question: "Number of polytopes grows exponentially with the size of network. Isn’t the approach computationally infeasible?" as suggested by Reviewer CZ96. > It would strengthen the paper to include a discussion on how the proposed method could generalize to more complex datasets and deeper neural networks. We have added the following lines to the discussion: ``The vision experiments in this paper show how the model can be applied to deeper networks by employing a front-end encoder to extract local image features (as explained in the first paragraph of Section 3.2). Instead of applying KDN to the entire ViT, we applied it solely to the final fully connected layers with ReLU activations. This reduces the effective depth of the model, ensuring that KDN can be applied even to encoders that do not rely on ReLU activations." ``However, a key limitation of our approach is the need to store parameters for each populated polytope, which may hinder scalability to very large datasets. One potential solution is to merge nearby polytopes into a single representative polytope."
Code: https://github.com/neurodata/kdg
Assigned Action Editor: ~Weijian_Deng1
Submission Number: 4311
Loading