Keywords: Counterfactual Explanations, Bayesian Neural Networks, BNN
Abstract: A counterfactual explanation describes the smallest input change required to alter
the prediction of an AI model towards a desired outcome. When using neural net-
works, counterfactuals are obtained using variants of projected gradient descent.
Such counterfactuals have been shown to be brittle and implausible, potentially
jeopardising the explanatory aspects of counterfactuals. Numerous approaches
for obtaining better counterfactuals have been put forward. Even though these
solutions address some of the shortcomings, they often fall short of providing
an all-around solution for robust and plausible counterfactuals. We hypothesise
this is due to the deterministic nature and limitations of neural networks, which
fail to capture the uncertainty of the training data. Bayesian Neural Networks
(BNNs) are a well-known class of probabilistic models that could be used to over-
come these issues; unfortunately, there is currently no framework for developing
counterfactuals for them. In this paper, we fill this gap by proposing a formal
framework to define counterfactuals for BNNs and develop algorithmic solutions
for computing them. We evaluate our framework on a set of commonly used
benchmarks and observe that BNNs produce counterfactuals that are more robust,
plausible, and less costly than deterministic baselines
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7999
Loading