Pre-Normalization Momentum Governs Optimizer-Induced Rank Bias

Raghav Kaushik Ravi; Srivarshinee Sridhar

Pre-Normalization Momentum Governs Optimizer-Induced Rank Bias

Raghav Kaushik Ravi, Srivarshinee Sridhar

Published: 24 May 2026, Last Modified: 28 May 2026ICML 2026 Workshop WSS PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Optimization Dynamics, Implicit Bias, Adaptive Optimizers, Low-Rank Structure, Spectral Bias, Deep Matrix Factorization

Abstract: Adaptive optimizers induce strikingly different implicit rank biases: Adam reliably recovers low-rank solutions in deep matrix factorization, whereas RMSProp catastrophically overshoots despite using the same adaptive normalization mechanism $g/\sqrt{v}$. We identify the responsible component as Adam's \emph{pre-normalization momentum filter}, the $\beta_1$ exponential moving average applied to raw gradients before adaptive normalization. We show that adaptive normalization removes the depth-dependent suppression underlying incremental rank learning, exposing adaptive methods to trailing-spectrum inflation. Under stationary trailing-direction noise, Adam's pre-normalization filter reduces update variance by $(1-\beta_1)/(1+\beta_1)$, corresponding to a standard-deviation reduction of $\approx 0.23$ at $\beta_1=0.9$. Controlled falsification experiments isolate this mechanism directly: removing $\beta_1$ breaks Adam's low-rank recovery, while post-normalization momentum fails to reproduce it. Sweeping $\beta_1$ reveals a sharp threshold, with stable low-rank recovery emerging only for $\beta_1 \gtrsim 0.7$. Finally, the mechanism transfers quantitatively to TinyLlama fine-tuning, where the observed RMSProp-to-AdamW update-magnitude ratio closely matches the theoretical prediction. Our results identify pre-normalization temporal filtering as a previously uncharacterized source of optimizer-induced spectral bias.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 73

Loading