Beyond Intuition: Rethinking Token Attributions inside Transformers

Published: 08 Feb 2023, Last Modified: 28 Feb 2023Accepted by TMLREveryoneRevisionsBibTeX
Abstract: The multi-head attention mechanism, or rather the Transformer-based models have always been under the spotlight, not only in the domain of text processing, but also for computer vision. Several works have recently been proposed around exploring the token attributions along the intrinsic decision process. However, the ambiguity of the expression formulation can lead to an accumulation of error, which makes the interpretation less trustworthy and less applicable to different variants. In this work, we propose a novel method to approximate token contributions inside Transformers. We start from the partial derivative to each token, divide the interpretation process into attention perception and reasoning feedback with the chain rule and explore each part individually with explicit mathematical derivations. In attention perception, we propose the head-wise and token-wise approximations in order to learn how the tokens interact to form the pooled vector. As for reasoning feedback, we adopt a noise-decreasing strategy by applying the integrated gradients to the last attention map. Our method is further validated qualitatively and quantitatively through the faithfulness evaluations across different settings: single modality (BERT and ViT) and bi-modality (CLIP), different model sizes (ViT-L) and different pooling strategies (ViT-MAE) to demonstrate the broad applicability and clear improvements over existing methods.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: - (Page 3, Section 3) We reorganized the paragraph at the beginning of Section 3 with the section numbers and equations for a closer connection to the roadmap in Figure 2. - (Page 3, Section 3, Preliminaries) We added a paragraph following Transformers to recall the important mathematical concepts in our derivation such as basis, coordinates and partial derivatives, and to introduce their notations such as $B$ and $P_{0, \cdot}^{(L)}$. - (Page 3-7, Section 3.1 & Section 3.2) We rephrased some sentences in Section 3 to avoid the confusing pointer and reference. - (Pages 3, Section 3.1 & Appendix A) We changed the notation for coordinates from $\widetilde{x}_i$ to $t_i$ to better distinguish between the basis vector $\widetilde{X}_i$ and coordinates. - (Page 1, footer) We provided the link to our code on Github.
Code: https://github.com/jiaminchen-1031/transformerinterp & https://github.com/PaddlePaddle/InterpretDL
Assigned Action Editor: ~Yonatan_Bisk1
License: Creative Commons Attribution 4.0 International (CC BY 4.0)
Submission Number: 521
Loading