The $\Psi$ Paradox in Extreme Superposition: When ETF Alignment Does Not Predict Language Model Generalization

Published: 02 Mar 2026, Last Modified: 03 Mar 2026ICLR 2026 Workshop ICBINBEveryoneRevisionsCC BY 4.0
Keywords: language models, embedding geometry, frame theory, equiangular tight frames, regularization, negative result, superposition, neural scaling laws
TL;DR: We show that optimizing embeddings toward theoretically-optimal Equiangular Tight Frame geometry doesn't improve language model generalization; simple norm regularization achieves the same gains.
Abstract: Recent work proposes that embedding geometry in language models should approach Equiangular Tight Frames (ETF), with $\Psi(W)!\to!1$ interpreted as ``optimal’’ interference structure. We evaluate this prediction in an extreme superposition setting ($n \gg m$) using Adaptive Superposition Control (ASC), which uses $\Psi$ as a feedback signal to modulate regularization. Across MiniGPT models on TinyShakespeare, ASC reduces validation perplexity and increases $\Psi$, yet $\Psi$ remains far from the ETF target. Moreover, a parameter-free UnitNorm projection matches most of the perplexity gains without explicitly optimizing $\Psi$ (and also improves $\Psi$ as a side effect). These results are consistent with a picture where controlling embedding norm dispersion is a primary driver of the observed gains in this regime, while $\Psi$-based ETF alignment is not a reliable explanatory variable. We discuss limitations, including metric choice, hyperparameter sensitivity, and how these observations relate to normalization used in modern LLMs.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 8
Loading