Gradient-based Counterfactual Explanations using Tractable Probabilistic ModelsDownload PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Keywords: Counterfactual example, Sum product networks, tractable probabilistic models, counterfactual explanation.
Abstract: Counterfactual examples are an appealing class of post-hoc explanations for machine learning models. Given input x of class y, its counterfactual is a contrastive example x' of another class y'. Current approaches primarily solve this task by a complex optimization: define an objective function based on the loss of the counterfactual outcome y' with hard or soft constraints, then optimize this function as a black-box. This “deep learning” approach, however, is rather slow, sometimes tricky, and may result in unrealistic counterfactual examples. In this work, we propose a novel approach to deal with these problems using only two gradient computations based on tractable probabilistic models. First, we compute an unconstrained counterfactual u of x to induce the counterfactual outcome y'. Then, we adapt u to higher density regions, resulting in x'. Empirical evidence demonstrates the dominant advantages of our approach.
One-sentence Summary: Generating counterfactual examples using tractable probabilistic models.
12 Replies

Loading