Interventional Explanations of Neural Networks
Abstract: Explainability and interpretability play an important role for adopting deep neural networks in critical systems. However, standard methods focus on correlation-based measures which lead to noisy and unstable explanations. In this paper, we propose a novel explanation method grounded in the theory of causal analysis to capture explanatory graphs from pre-trained DNNs. Through analyzing the effect of path interventions at various nodes on model's performance, we are able to reveal the causal mechanisms within hidden layers and isolate the relevant components from noisy ones. We apply our method on vision models trained for object classification to capture high-level semantics from filters causally connected to predictions. Experiments show that the causal graphs allow us revealing the true causes for model behaviour and enable extracting more stable and consistent explanations than standard methods.
0 Replies
Loading