IMAST: Importance-Aware Statistical Test for Transformer Interpretability Evaluation

20 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Primary Area: visualization or interpretation of learned representations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Post-hoc Explainability, Vision Transformer, Explainable AI
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: We propose an evaluation framework for Vision Transformer explanations, which sets a robust benchmark for evaluating faithfulness and provides guidance for future development of Transformer interpretability.
Abstract: Post-hoc explanations offer a promising avenue to interpret Transformer models. Despite plausible visualizations, rigorous evaluations of their efficacy remain largely unexplored. In this paper, we focus on the principle of faithfulness, a fundamental property of explanation methods: the importance scores derived from explanation methods should reflect the anticipated impact of corresponding input elements. To this end, we propose a novel evaluation framework, the IMportance-Aware Statistical Test (IMAST). Unlike traditional metrics that rely on cumulative perturbation and quantify performance reduction, IMAST performs statistical comparisons among individual pixel subsets, thereby aggregating their importance score differences into a resulting faithfulness coefficient. Extensive experiments demonstrate the shortcomings of existing metrics in aligning with the faithfulness assumption, as they often cannot distinguish Random Attribution from advanced explanations. In contrast, IMAST is effective in setting a baseline for evaluating faithfulness, which provides a robust benchmark for explanations. Moreover, using the proposed IMAST, we find through ablation studies that the incorporation of gradient information and cross-layer aggregation significantly improves the faithfulness of attention-based methods, providing guidance for the future development of Transformer interpretability.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2678
Loading