Correlations in the Data Lead to Semantically Rich Feature Geometry Under Superposition

Lucas Prieto; Edward Stevinson; Melih Barsbey; Tolga Birdal; Pedro A. M. Mediano

Correlations in the Data Lead to Semantically Rich Feature Geometry Under Superposition

Lucas Prieto, Edward Stevinson, Melih Barsbey, Tolga Birdal, Pedro A. M. Mediano

Published: 30 Sept 2025, Last Modified: 06 Nov 2025Mech Interp Workshop (NeurIPS 2025) SpotlightEveryoneRevisionsBibTeXCC BY 4.0

Open Source Links: https://anonymous.4open.science/r/correlations-feature-geometry-AF54.

Keywords: Foundational work

Other Keywords: Superposition, Feature geometry, Linear Representation Hypothesis

TL;DR: Small latent spaces and weight decay lead to linear PCA-like superposition under which feature geometry reflects the correlations in the data, explaining previously observed feature geometries.

Abstract: Recent advances in mechanistic interpretability have shown that many features of deep learning models can be captured by dictionary learning approaches such as sparse autoencoders. However, our geometric intuition for how features arrange themselves in a representation space is still limited. ''Toy‑model'' analyses have shown that in an idealized setting features can be arranged in local structures, such as small regular polytopes, through a phenomenon known as _superposition_. Yet these local structures have not been observed in real language models. In contrast, these models display rich structures like ordered circles for the months of the year or semantic clusters which are not predicted by current theories. In this work, we introduce Bag‑of‑Words Superposition (BOWS), a framework in which autoencoders with a ReLU in the decoder are trained to compress sparse, binary bag‑of‑words vectors drawn from Internet‑scale text. This simple set-up reveals the existence of a _linear regime_ of superposition, which appears in ReLU autoencoders with small latent sizes or which use weight decay. We show that this linear PCA-like superposition naturally gives rise to the same semantically rich structures observed in real language models. Code is available under https://anonymous.4open.science/r/correlations-feature-geometry-AF54.

Submission Number: 272

Loading