Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Influence-Directed Explanations for Deep Convolutional Networks
Anupam Datta, Matt Fredrikson, Klas Leino, Linyi Li, Shayak Sen
Feb 15, 2018 (modified: Feb 15, 2018)ICLR 2018 Conference Blind Submissionreaders: everyoneShow Bibtex
Abstract:We study the problem of explaining a rich class of behavioral properties of deep neural networks. Our influence-directed explanations approach this problem by peering inside the network to identify neurons with high influence on the property of interest using an axiomatically justified influence measure, and then providing an interpretation for the concepts these neurons represent. We evaluate our approach by training convolutional neural networks on Pubfig, ImageNet, and Diabetic Retinopathy datasets. Our evaluation demonstrates that influence-directed explanations (1) localize features used by the network, (2) isolate features distinguishing related instances, (3) help extract the essence of what the network learned about the class, and (4) assist in debugging misclassifications.
TL;DR:We present an influence-directed approach to constructing explanations for the behavior of deep convolutional networks, and show how it can be used to answer a broad set of questions that could not be addressed by prior work.