Keywords: Gradient Descent, Edge of Stability, Convergence Direction
Abstract: Gradient descent (GD) is a fundamental optimization method in deep learning, yet its asymptotic directional properties remain less understood. In this paper, we prove that if GD converges, its trajectory either aligns toward a fixed direction or oscillates along a specific line. The fixed-direction convergence occurs under small learning rates, while the oscillatory convergence behavior emerges for large learning rates. This result offers a new lens for understanding long-term GD dynamics. Experimentally, we find that this directional convergence behavior also appears in stochastic gradient descent (SGD) and Adam. Furthermore, we discuss how these theoretical findings regarding oscillatory convergence might offer a perspective on the sharpness dynamics observed in the Edge of Stability (EoS) regime. Our work provides both theoretical clarity and practical insight into the behavior of dynamics for multiple optimization methods.
Supplementary Material: pdf
Primary Area: optimization
Submission Number: 8584
Loading