Keywords: Dynamical stability, Bifurcation theory, Functional analysis, Koopman
TL;DR: Stability analysis of nonlinear dynamics in GD and SGD
Abstract: The dynamical stability of the iterates during training plays a key role in determining the minima obtained by training algorithms. For example, stable solutions of gradient descent (GD) correspond to flat minima, and these have been associated with favorable features. While prior work often relies on linearization to determine stability, it remains unclear whether linearized dynamics faithfully capture the full nonlinear behavior. In this work, we explicitly study the effect of nonlinear terms. For GD, we show that linear analysis can be misleading. The iterates may stably oscillate near a linearly unstable minimum, and still converge once the step size decays. Here, we derive an exact condition for such stable oscillations, which depends on higher-order derivatives of the loss. Extending the analysis to stochastic gradient descent (SGD), we demonstrate that nonlinear dynamics can diverge in expectation if even a single batch is unstable. This implies that stability can be dictated by the worst-case batch, rather than an average effect, as linear analysis suggests. Finally, we prove that if all batches are linearly stable, then the nonlinear dynamics of SGD are stable in expectation.
Primary Area: optimization
Submission Number: 14355
Loading