Interventional Black-Box ExplanationsDownload PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Keywords: Causal Inference, Interventions, black-Box Models, Explanations, Deep Neural Networks
Abstract: Deep Neural Networks (DNNs) are powerful systems able to freely evolve on their own from training data. However, like any highly parametrized mathematical model, capturing the explanation of any prediction of such models is rather difficult. We believe that there exist relevant mechanisms inside the structure of post-hoc DNNs that supports transparency and interpretability. To capture these mechanisms, we quantify the effects of parameters (pieces of knowledge) on models' predictions using the framework of causality. We introduce a general formalism of the causal diagram to express cause-effect relations inside the DNN's architecture. Then, we develop a novel algorithm to construct explanations of DNN's predictions using the $do$-operator. We call our method, Interventional Black-Box Explanations. On image classification tasks, we explain the behaviour of the model and extract visual explanations from the effects of the causal filters in convolution layers. We qualitatively demonstrate that our method captures more informative concepts compared to traditional attribution-based methods. Finally, we believe that our method is orthogonal to logic-based explanation methods and can be leveraged to improve their explanations.
One-sentence Summary: The paper is about explaining post-hoc DNNs using causal inference
6 Replies

Loading