Keywords: Deep Neural Networks, Attribution methods, Theory of deep learning
TL;DR: Four existing backpropagation-based attribution methods are fundamentally similar. How to assess it?
Abstract: Understanding the flow of information in Deep Neural Networks (DNNs) is a challenging problem that has gain increasing attention over the last few years. While several methods have been proposed to explain network predictions, there have been only a few attempts to compare them from a theoretical perspective. What is more, no exhaustive empirical comparison has been performed in the past. In this work we analyze four gradient-based attribution methods and formally prove conditions of equivalence and approximation between them. By reformulating two of these methods, we construct a unified framework which enables a direct comparison, as well as an easier implementation. Finally, we propose a novel evaluation metric, called Sensitivity-n and test the gradient-based attribution methods alongside with a simple perturbation-based attribution method on several datasets in the domains of image and text classification, using various network architectures.
Code: [![github](/images/github_icon.svg) kundajelab/deeplift](https://github.com/kundajelab/deeplift) + [![Papers with Code](/images/pwc_icon.svg) 1 community implementation](https://paperswithcode.com/paper/?openreview=Sy21R9JAW)
Data: [CIFAR-10](https://paperswithcode.com/dataset/cifar-10), [IMDb Movie Reviews](https://paperswithcode.com/dataset/imdb-movie-reviews)