TL;DR: We propose a unified explainability framework generating global explanations in terms of causal graph followed by image level feature attribution and counterfactual explanations.
Abstract: Most of the current explainability techniques focus on capturing the importance of features in input space. However, given the complexity of models and data-generating processes, the resulting explanations are far from being complete, in that they lack an indication of feature interactions and visualization of their effect. In this work, we propose a novel surrogate-model-based explainability framework to explain the decisions of any CNN-based image classifiers by extracting causal relations between the features. These causal relations serve as global explanations from which local explanations of different forms can be obtained. Specifically, we employ a generator to visualize the `effect' of interactions among features in latent space and draw feature importance therefrom as local explanations. We demonstrate and evaluate explanations obtained with our framework on the Morpho-MNIST, the FFHQ, and the AFHQ datasets.
Submission Track: Full Paper Track
Application Domain: Computer Vision
Survey Question 1: We propose a two stage explainability framework, one utilizing entire dataset to extract causal graph generating global explanations, second stage follows the extracted causal graph generating image level feature attribution and counterfactual explanations.
Survey Question 2: The main limitation in existing approaches which we address in this work is to make use of causal vocabulary of the model to generate explanations. We show that causal vocabulary effectively increases the expressivity and faithfulness of the generated explanations.
Survey Question 3: We propose a novel explainability framework, which uses estimates the causal vocabulary in the data to generate local explanations.
Submission Number: 44
Loading