Influence Patterns for Explaining Information Flow in BERT

Kaiji Lu; Zifan Wang; Piotr Mardziel; Anupam Datta

Influence Patterns for Explaining Information Flow in BERT

Kaiji Lu, Zifan Wang, Piotr Mardziel, Anupam Datta

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 PosterReaders: Everyone

Keywords: BERT, interpretability, attention, information flow, Transformers

Abstract: While attention is all you need may be proving true, we do not know why: attention-based transformer models such as BERT are superior but how information flows from input tokens to output predictions are unclear. We introduce influence patterns, abstractions of sets of paths through a transformer model. Patterns quantify and localize the flow of information to paths passing through a sequence of model nodes. Experimentally, we find that significant portion of information flow in BERT goes through skip connections instead of attention heads. We further show that consistency of patterns across instances is an indicator of BERT’s performance. Finally, we demonstrate that patterns account for far more model performance than previous attention-based and layer-based methods.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

TL;DR: This paper introduces a new technique for explaining the information flow in BERT's computational graph using gradient-based method.

Supplementary Material: pdf

Code: https://github.com/caleblu/influence-patterns

11 Replies

Loading