On the Out-of-Distribution Coverage of Combining Split Conformal Prediction and Bayesian Deep Learning

Paul Scemama; Ariel Kapusta

On the Out-of-Distribution Coverage of Combining Split Conformal Prediction and Bayesian Deep Learning

Paul Scemama, Ariel Kapusta

Published: 19 Feb 2024, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Bayesian deep learning and conformal prediction are two methods that have been used to convey uncertainty and increase safety in machine learning systems. We focus on combining Bayesian deep learning with split conformal prediction and how the addition of conformal prediction affects out-of-distribution coverage that we would otherwise see; particularly in the case of multiclass image classification. We suggest that if the model is generally underconfident on the calibration set, then the resultant conformal sets may exhibit worse out-of-distribution coverage compared to simple predictive credible sets (i.e. not using conformal prediction). Conversely, if the model is overconfident on the calibration set, the use of conformal prediction may improve out-of-distribution coverage. In particular, we study the extent to which the addition of conformal prediction increases or decreases out-of-distribution coverage for a variety of inference techniques. In particular, (i) stochastic gradient descent, (ii) deep ensembles, (iii) mean-field variational inference, (iv) stochastic gradient Hamiltonian Monte Carlo, and (v) Laplace approximation. Our results suggest that the application of conformal prediction to different predictive deep learning methods can have significantly different consequences.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: We are indebted to the reviewers and the action editor and thank them all for their hard work. We feel that it has significantly improved the paper since its first version. Updates since the last submission: - Updated Motivation section to address reviewer NvAv's concerns. In particular, defining underconfidence and overconfidence in terms of credible set coverage on the calibration set; and discuss how this manifests into smaller or larger average set sizes when using conformal prediction. This also involved getting rid of the reliability plots and instead illustrating the calibration dataset coverage and set sizes. - Addressed all 3 suggested rephrasings from the action editor.

Assigned Action Editor: ~Vincent_Fortuin1

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Submission Number: 1857

Loading