Keywords: Attention sink; GPT-2
Abstract: Transformers commonly exhibit an attention sink: disproportionately high attention to the first position. We study this behavior in GPT-2–style models with learned query biases and absolute positional embeddings. Combining analysis with targeted interventions, we find that the sink arises from the interaction among (i) a learned query bias, (ii) the first-layer transformation of the positional encoding and (iii) structure in the key projection. Together with observations of sinks in models without query biases or absolute positional embeddings (e.g., ALiBi), this indicates that attention sinks do not arise from a single universal mechanism but instead depend on architecture. These findings inform mitigation of attention sink, and motivate broader investigation of sink mechanisms across different architectures.
Paper Type: Short
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: counterfactual / contrastive explanations
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 1158
Loading