Attribution Scores are Redundant: Explaining Feature Contribution By TrajectoriesDownload PDF

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Interpretability, Trajectory Importance, Combinatorial Optimization
Abstract: Opening black boxes and revealing the inner mechanism of deep models is vital in applying them to real-world tasks. As one of the most intuitive and straightforward explanations for deep models, attributive explanation methods have been extensively studied. Existing attribution methods typically assign attribution scores to each individual feature as an explanation. However, when we use or evaluate the explanations in practice, what really matters is not the attribution scores, but the rank order of features (e.g., identifying the top-contributing features, or checking for changes in the model output by masking features in order). In other words, achieving attribution scores is a redundant step in achieving explanations. To address this, we propose a novel framework TRAjectory importanCE (TRACE) which directly provides feature ranking explanations. Our method introduces several improvements. First, TRACE greatly reduces the set of feasible explanations, allowing us to actually solve for the best explanation. Second, TRACE is able to achieve the theoretically-grounded best possible explanation in commonly used deletion evaluations. Third, we provide extensive experiments to validate that TRACE outperforms attribution methods with a significant margin.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Social Aspects of Machine Learning (eg, AI safety, fairness, privacy, interpretability, human-AI interaction, ethics)
TL;DR: We propose a novel form of explanation that not only outperforms attribution methods in the most commonly used insertion/deletion metric, but also is able to theoretically achieve the best possible explanations under such metric.
Supplementary Material: zip
6 Replies

Loading