Relation-Based Counterfactual Explanations for Bayesian Network Classifiers

Published: 01 Jan 2020, Last Modified: 27 Jan 2025IJCAI 2020EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We propose a general method for generating counterfactual explanations (CFXs) for a range of Bayesian Network Classifiers (BCs), e.g. single- or multi-label, binary or multidimensional. We focus on explanations built from relations of (critical and potential) influence between variables, indicating the reasons for classifications, rather than any probabilistic information. We show by means of a theoretical analysis of CFXs’ properties that they serve the purpose of indicating (potentially) pivotal factors in the classification process, whose absence would give rise to different classifications. We then prove empirically for various BCs that CFXs provide useful information in real world settings, e.g. when race plays a part in parole violation prediction, and show that they have inherent advantages over existing explanation methods in the literature.
Loading