Influence-Directed Explanations for Deep Convolutional Networks

Anupam Datta; Matt Fredrikson; Klas Leino; Linyi Li; Shayak Sen

Influence-Directed Explanations for Deep Convolutional Networks

Anupam Datta, Matt Fredrikson, Klas Leino, Linyi Li, Shayak Sen

15 Feb 2018 (modified: 22 Jun 2025)ICLR 2018 Conference Blind SubmissionReaders: Everyone

Abstract: We study the problem of explaining a rich class of behavioral properties of deep neural networks. Our influence-directed explanations approach this problem by peering inside the network to identify neurons with high influence on the property of interest using an axiomatically justified influence measure, and then providing an interpretation for the concepts these neurons represent. We evaluate our approach by training convolutional neural networks on Pubfig, ImageNet, and Diabetic Retinopathy datasets. Our evaluation demonstrates that influence-directed explanations (1) localize features used by the network, (2) isolate features distinguishing related instances, (3) help extract the essence of what the network learned about the class, and (4) assist in debugging misclassifications.

TL;DR: We present an influence-directed approach to constructing explanations for the behavior of deep convolutional networks, and show how it can be used to answer a broad set of questions that could not be addressed by prior work.

Keywords: Deep neural networks, convolutional networks, influence measures, explanations

Code: [![Papers with Code](/images/pwc_icon.svg) 2 community implementations](https://paperswithcode.com/paper/?openreview=SJPpHzW0-)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/influence-directed-explanations-for-deep/code)

4 Replies

Loading