- Abstract: We study the problem of explaining a rich class of behavioral properties of deep neural networks. Our influence-directed explanations approach this problem by peering inside the network to identify neurons with high influence on the property of interest using an axiomatically justified influence measure, and then providing an interpretation for the concepts these neurons represent. We evaluate our approach by training convolutional neural networks on Pubfig, ImageNet, and Diabetic Retinopathy datasets. Our evaluation demonstrates that influence-directed explanations (1) localize features used by the network, (2) isolate features distinguishing related instances, (3) help extract the essence of what the network learned about the class, and (4) assist in debugging misclassifications.
- TL;DR: We present an influence-directed approach to constructing explanations for the behavior of deep convolutional networks, and show how it can be used to answer a broad set of questions that could not be addressed by prior work.
- Keywords: Deep neural networks, convolutional networks, influence measures, explanations