Towards Predictable Feature Attribution: Revisiting and Improving Guided BackPropagationDownload PDF

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Keywords: explanation, interpretation, BP-based attributions, predictability
Abstract: Recently, backpropagation(BP)-based feature attribution methods have been widely adopted to interpret the internal mechanisms of convolutional neural networks (CNNs), and expected to be human-understandable (lucidity) and faithful to decision-making processes (fidelity). In this paper, we introduce a novel property for feature attribution: predictability, which means users can forecast behaviors of the interpretation methods. With the evidence that many attribution methods have unexpected and harmful phenomena like class-insensitivity, the predictability is critical to avoid over-trust and misuse from users. Observing that many intuitive improvements for lucidity and fidelity tend to sacrifice predictability, we propose a new visual explanation method called TR-GBP (Theoretical Refinements of Guided BackPropagation) which revisits and improves GBP from theoretical perspective rather than solely optimizing the attribution performance. Qualitative and quantitative experiments show that TR-GBP is more visually sharpened, gets rid of the fidelity problems in GBP, and effectively predicts the possible behaviors so that we can easily discriminate some prediction errors from interpretation errors. The codes of TR-GBP are available in supplementary and will be open source.
One-sentence Summary: A preliminary introduction and study of predictability for attribution methods and a new predictable attribution method TR-GBP.
Supplementary Material: zip
4 Replies

Loading