Attention Clusters: Revealing the Inductive Bias of Attention Mechanisms

Chibueze A. Azubuike; Hongbin Ma

Attention Clusters: Revealing the Inductive Bias of Attention Mechanisms

Chibueze A. Azubuike, Hongbin Ma

14 Sept 2025 (modified: 28 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Self-Attention, Clusters, Transformers, Inductive bias, Representation geometry, Model diagnostics

TL;DR: A training-free method using attention's innate clustering reveals how architectural choices affect model performance.

Abstract: We introduce a parameter-free framework to isolate the self-attention mechanism, stripping away all learned parameters. Through iterative application, we demonstrate that self-attention alone intrinsically drives the formation of semantically meaningful clusters in the representation space. Analyzing this behavior across global, local-window, and hybrid patterns reveals their inherent geometric biases independent of training. Crucially, we find that query scaling (as used in Longformer) induces an implicit dimensionality reduction that systematically improves model generalization, a insight we validate experimentally. This geometric bias is consistent across both low-dimensional data and high-dimensional real-world representations. Probing a pre-trained model confirms this is architecturally inherent and further refined by learning. Our work provides a useful diagnostic tool for evaluating attention architectures prior to training.

Primary Area: interpretability and explainable AI

Supplementary Material: zip

Submission Number: 5040

Loading