Unveiling Simplicities of Attention: Adaptive Long-Context Head Identification

Konstantin Donhauser; Charles Arnal; Mohammad Pezeshki; Vivien Cabannes; David Lopez-Paz; Kartik Ahuja

Unveiling Simplicities of Attention: Adaptive Long-Context Head Identification

Konstantin Donhauser, Charles Arnal, Mohammad Pezeshki, Vivien Cabannes, David Lopez-Paz, Kartik Ahuja

Published: 05 Mar 2025, Last Modified: 31 Mar 2025SLLMEveryoneRevisionsBibTeXCC BY 4.0

Track: long paper (up to 4 pages)

Keywords: sparse attention, long-context, LLMs

TL;DR: We propose a simple efficient criterion to query-adaptively determine whether an attention head processes long-context information.

Abstract: The ability to process long contexts is crucial for many natural language processing tasks, yet it remains a significant challenge. While substantial progress has been made in enhancing the efficiency of attention mechanisms, there is still a gap in understanding how attention heads function in long-context settings. In this paper, we observe that while certain heads consistently attend to local information only, others swing between attending to local and long-context information depending on the query. This raises the question: can we identify which heads require long-context information to predict the next token accurately? We demonstrate that it's possible to predict which heads are crucial for long-context processing using only local keys. The core idea here is to exploit a simple model for the long-context scores via second moment approximations. These findings unveil simple properties of attention in the context of long sequences, and open the door to potentially significant gains in efficiency.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 42

Loading