A Benchmark for Interpretability Methods in Deep Neural NetworksDownload PDF

Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, Been Kim

06 Sept 2019 (modified: 05 May 2023)NeurIPS 2019Readers: Everyone
Abstract: We propose an empirical measure of the approximate accuracy of feature importance estimates in deep neural networks. Our results across several large-scale image classification datasets show that many popular interpretability methods produce estimates of feature importance that are not better than a random designation of feature importance. Only certain ensemble based approaches---VarGrad and SmoothGrad-Squared---outperform such a random assignment of importance.
Code Link: https://github.com/google-research/google-research/tree/master/interpretability_benchmark
CMT Num: 5145
0 Replies

Loading