On the Stability of Nonlinear Dynamics in GD and SGD: Beyond Quadratic Potentials

On the Stability of Nonlinear Dynamics in GD and SGD: Beyond Quadratic Potentials

ICLR 2026 Conference Submission14355 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Dynamical stability, Bifurcation theory, Functional analysis, Koopman

TL;DR: Stability analysis of nonlinear dynamics in GD and SGD

Abstract: The dynamical stability of the iterates during training plays a key role in determining the minima obtained by training algorithms. For example, stable solutions of gradient descent (GD) correspond to flat minima, and these have been associated with favorable features. While prior work often relies on linearization to determine stability, it remains unclear whether linearized dynamics faithfully capture the full nonlinear behavior. In this work, we explicitly study the effect of nonlinear terms. For GD, we show that linear analysis can be misleading. The iterates may stably oscillate near a linearly unstable minimum, and still converge once the step size decays. Here, we derive an exact condition for such stable oscillations, which depends on higher-order derivatives of the loss. Extending the analysis to stochastic gradient descent (SGD), we demonstrate that nonlinear dynamics can diverge in expectation if even a single batch is unstable. This implies that stability can be dictated by the worst-case batch, rather than an average effect, as linear analysis suggests. Finally, we prove that if all batches are linearly stable, then the nonlinear dynamics of SGD are stable in expectation.

Primary Area: optimization

Submission Number: 14355

Loading