What Arranges Features in Activation Space?  Non-Classical Predictive Geometry in Next-Token Predictors

Adam Shai; Thomas Joseph Elliott; Paul M. Riechers

What Arranges Features in Activation Space? Non-Classical Predictive Geometry in Next-Token Predictors

Adam Shai, Thomas Joseph Elliott, Paul M. Riechers

Published: 11 Jun 2026, Last Modified: 11 Jun 2026Mech Interp Workshop ICML 2026 VirtualposterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Feature Geometry

TL;DR: Non-Classical Predictive Geometry Explains the Arrangement of Features in Activation Space.

Abstract: Mechanistic interpretability often studies the local features and circuits that implement model computations. What principles govern the arrangement of these features and circuits into geometric structures in activation space? To make this tractable, we study how the computational class of the training-data generator constrains the geometry of predictive states. We show that while the data distribution determines which features are required for prediction, a predictor realizes those features as beliefs about its current latent state, and the generator class determines the geometry of those beliefs. Using this theoretical insight, we design synthetic datasets whose minimal predictive representations fall into different model classes, and test which geometry neural networks learn. In particular, we train transformers, LSTMs, GRUs, and vanilla RNNs on datasets whose predictive geometries are known analytically: a classical HMM process, a quantum-realizable process with no finite-state HMM realization, and a generalized-probabilistic process with no finite-dimensional quantum realization. Across architectures, a single affine map from activations decodes the corresponding predictive representation in each case: HMM beliefs in a latent simplex, Bloch-vector quantum states, or a finite-dimensional generalized predictive vector. These representations emerge during training and fit the compact non-classical geometry far better than finite-order classical Markov baselines. These results suggest that understanding predictive representations requires asking not only which features a network represents, but what geometry organizes those features.

Submission Number: 545

Loading