Attention Mechanisms Don’t Learn Additive Models: Rethinking Feature Importance for Transformers

Tobias Leemann; Alina Fastowski; Felix Pfeiffer; Gjergji Kasneci

Attention Mechanisms Don’t Learn Additive Models: Rethinking Feature Importance for Transformers

Tobias Leemann, Alina Fastowski, Felix Pfeiffer, Gjergji Kasneci

Published: 06 Jan 2025, Last Modified: 06 Jan 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We address the critical challenge of applying feature attribution methods to the transformer architecture, which dominates current applications in natural language processing and beyond. Traditional attribution methods to explainable AI (XAI) explicitly or implicitly rely on linear or additive surrogate models to quantify the impact of input features on a model's output. In this work, we formally prove an alarming incompatibility: transformers are structurally incapable of representing linear or additive surrogate models used for feature attribution, undermining the grounding of these conventional explanation methodologies. To address this discrepancy, we introduce the Softmax-Linked Additive Log Odds Model (SLALOM), a novel surrogate model specifically designed to align with the transformer framework. SLALOM demonstrates the capacity to deliver a range of insightful explanations with both synthetic and real-world datasets. We highlight SLALOM's unique efficiency-quality curve by showing that SLALOM can produce explanations with substantially higher fidelity than competing surrogate models or provide explanations of comparable quality at a fraction of their computational costs. We release code for SLALOM as an open-source project online at https://github.com/tleemann/slalom_explanations.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: Camera-Ready Version. Main additions include * Emphasized that SLALOM is a token-level method in the limitations part of the conclusion as pointed out by the reviewer (Section 7) * Included results for larger models in the main paper (Table 1b) * Made code available online and included link in the abstract as suggested by reviewers * Clarifications in the text as mentioned in the meta-review (Section 2, 5, Caption of Figures 3 and 4) * Fixed typos and rephrased some sentences for additional clarity ----------------- Update: Only minor changes. Made another pass for grammar and fixed some additional typos.

Code: https://github.com/tleemann/slalom_explanations

Supplementary Material: zip

Assigned Action Editor: ~Shiyu_Chang2

Submission Number: 3531

Loading