Edge of Stochastic Stability: Revisiting the Edge of Stability for SGD

ICLR 2026 Conference Submission15914 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Edge of Stability, Optimization for deep learning, SGD, Instabilities of Training
Abstract: Recent findings by Cohen et al. demonstrate that when training neural networks with full-batch gradient descent with step size $\eta$, the largest eigenvalue~$\lambda_{\max}$ of the full-batch Hessian consistently stabilizes at $\lambda_{\max}=2/\eta$. These results have significant implications for convergence and generalization. This, however, is not the case of mini-batch stochastic gradient descent (SGD), limiting the broader applicability of its consequences. We show that SGD trains in a different regime we term Edge of Stochastic Stability (EoSS). In this regime, what stabilizes at $2/\eta$ is Batch Sharpness: the expected directional curvature of mini-batch Hessians along their corresponding stochastic gradients. As a consequence, $\lambda_{\max}$---which is generally smaller than Batch Sharpness---is suppressed, aligning with the long-standing empirical observation that smaller batches and larger step sizes favor flatter minima. We further discuss implications for mathematical modeling of SGD trajectories.
Primary Area: optimization
Submission Number: 15914
Loading