Order by Scale: Relative‑Magnitude Relational Composition in Attention‑Only Transformers

Published: 08 Nov 2025, Last Modified: 08 Nov 2025ResponsibleFM @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Circuit analysis, Causal interventions, AI Safety
TL;DR: We find a new relational composition method based on relative vector magnitudes in a toy model, challenging the common view that transformer features can be treated as binary on/off switches.
Abstract: LLMs and other transformers learn relational composition mechanisms to solve tasks such as tracking information about subjects (``Alice lives in France. Bob lives in Thailand.'') to answer questions, or parallelising precomputing sub-paths in graph-based path-finding problems. There is a deep theoretical literature on vector composition methods, yet we lack empirical studies of what mechanisms transformers learn in practice. In particular, different composition methods affect sparse autoencoders (SAEs), a popular method for decomposing model activations, in different ways. We present empirical evidence in a controlled attention-only transformer that ordered relational information can be encoded via a relative magnitude-based mechanism, i.e. by a weighted sum of vectors, rather than predicted direction-based mechanisms such as additive matrix binding. While absolute magnitude-based mechanisms have been reported for other architectures (e.g. onion representations in RNNs), to our knowledge this is the first controlled demonstration of a relative magnitude mechanism in attention-only transformers. This result challenges the prevailing view in mechanistic interpretability research that transformer features can be viewed as binary and independent, and motivates a re-examination of these methods with respect to feature activation value and interactions between features at different values. In future work, we will remove the constraints placed on our toy setting, and attempt to find evidence of these mechanisms in LLMs.
Submission Number: 5
Loading