The Geometry of Spectral Gradient Descent: Layerwise Criteria for SignSGD vs SpecSGD

Published: 02 Mar 2026, Last Modified: 04 Mar 2026ICLR 2026 Workshop GRaM PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: tiny paper (up to 4 pages)
Keywords: Non-Euclidean optimization, Spectral Gradient Descent
TL;DR: We introduce a noise-aware one-step criterion for choosing between entrywise and spectral sign updates at the layer level.
Abstract: Optimization in deep learning has expanded beyond Euclidean methods to include entrywise sign updates (SignSGD) and spectral sign updates (SpecGD/Muon). While both can be viewed as steepest descent under non-Euclidean geometries ($l_\infty$ and spectral norms, respectively), existing theory analyzes them in isolation. This leaves a practical gap: for a specific layer with finite batch size, which geometry yields better optimization progress? We address this by deriving a unified bound on the expected one-step loss improvement that accounts for both the local geometry of the gradient signal and the geometry of the stochastic noise. We introduce the \emph{Spec-Sign Advantage Index}, a layerwise scalar criterion that determines the optimal optimizer choice. Our analysis reveals that SpecGD is preferred not simply when gradients are low-rank, but when the nuclear-norm signal-to-noise ratio exceeds the $l_1$-norm signal-to-noise ratio, generalizing recent deterministic condition checks to the stochastic regime.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 106
Loading