Understanding Edge of Stability in Rank-1 Linear Models for Binary Classification

ICLR 2026 Conference Submission22073 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: gradient descent, optimization, edge of stability, training dynamics, linear networks
Abstract: Recent research in deep learning optimization reveals that many neural network architectures trained using gradient descent with practical step sizes, $\eta$, exhibit an interesting phenomenon where the top eigenvalue of the Hessian of the loss function, $\lambda_1^H$ increases to and oscillates about the stability threshold, $\frac{2}{\eta}$. The two parts of the trajectory are referred to as progressive sharpening and edge of stability. The oscillation in $\lambda_1^H$ is accompanied by a non-monotonically decreasing training loss. In this work, we study the Edge of Stability phenomenon in a two-layer rank-$1$ linear model for the binary classification task with linearly separable data to minimize logistic loss. By capturing the core training dynamics of our model as a low-dimensional system, we rigorously prove that Edge of Stability behavior is not possible in the simplest one datapoint setting. We also empirically show that, with two datapoints, it is possible for Edge of Stability to occur and point out the source of the oscillation in $\lambda_1^H$ and non-monotonic training loss. We also give new approximations to $\lambda_1^H$ for such models. Lastly, we consider an asymptotic setting, in the limit as the margin converges to $0$, and provide empirical results that suggest the loss and sharpness trajectories may exhibit stable, perpetual oscillation.
Primary Area: optimization
Submission Number: 22073
Loading