A novel analysis of gradient descent under directional smoothness

Aaron Mishkin; Ahmed Khaled; Aaron Defazio; Robert M. Gower

A novel analysis of gradient descent under directional smoothness

Aaron Mishkin, Ahmed Khaled, Aaron Defazio, Robert M. Gower

Published: 26 Oct 2023, Last Modified: 13 Dec 2023NeurIPS 2023 Workshop PosterEveryoneRevisionsBibTeX

Keywords: directional smoothness, gradient descent, exponential search

TL;DR: We derive new convergence rates for gradient descent which depend only on local properties of the objective using directional smoothness.

Abstract: We develop new sub-optimality bounds for gradient descent that depend on the conditioning of the objective along the path of optimization, rather than on global, worst-case constants. Key to our proofs is directional smoothness, a measure of gradient variation that we use to develop upper-bounds on the objective. Minimizing these upper-bounds requires solving an implicit equation to obtain an adapted step-size; we show that this equation is straightforward to solve for convex quadratics and leads to new guarantees for a classical step-size sequence. For general functions, we prove that exponential search can be used to obtain a path-dependent convergence guarantee with only a log-log dependency on the global smoothness constant. Experiments on quadratic functions showcase the utility of our theory and connections to the edge-of-stability phenomenon.

Submission Number: 77

Loading