Uncertainty-Aware Counterfactual Explanations using Bayesian Neural Nets

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Counterfactual Explanations, Bayesian Neural Networks, BNN
Abstract: A counterfactual explanation describes the smallest input change required to alter the prediction of an AI model towards a desired outcome. When using neural net- works, counterfactuals are obtained using variants of projected gradient descent. Such counterfactuals have been shown to be brittle and implausible, potentially jeopardising the explanatory aspects of counterfactuals. Numerous approaches for obtaining better counterfactuals have been put forward. Even though these solutions address some of the shortcomings, they often fall short of providing an all-around solution for robust and plausible counterfactuals. We hypothesise this is due to the deterministic nature and limitations of neural networks, which fail to capture the uncertainty of the training data. Bayesian Neural Networks (BNNs) are a well-known class of probabilistic models that could be used to over- come these issues; unfortunately, there is currently no framework for developing counterfactuals for them. In this paper, we fill this gap by proposing a formal framework to define counterfactuals for BNNs and develop algorithmic solutions for computing them. We evaluate our framework on a set of commonly used benchmarks and observe that BNNs produce counterfactuals that are more robust, plausible, and less costly than deterministic baselines
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7999
Loading