Directional Smoothness and Gradient Methods: Convergence and Adaptivity

Published: 25 Sept 2024, Last Modified: 13 Jan 2025NeurIPS 2024 posterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0
Keywords: directional smoothness, gradient descent, exponential search, polyak stepsize, normalized gradient descent
TL;DR: We derive new convergence rates for gradient descent which depend only on local properties of the objective using directional smoothness.
Abstract: We develop new sub-optimality bounds for gradient descent (GD) that depend on the conditioning of the objective along the path of optimization, rather than on global, worst-case constants. Key to our proofs is directional smoothness, a measure of gradient variation that we use to develop upper-bounds on the objective. Minimizing these upper-bounds requires solving implicit equations to obtain a sequence of strongly adapted step-sizes; we show that these equations are straightforward to solve for convex quadratics and lead to new guarantees for two classical step-sizes. For general functions, we prove that the Polyak step-size and normalized GD obtain fast, path-dependent rates despite using no knowledge of the directional smoothness. Experiments on logistic regression show our convergence guarantees are tighter than the classical theory based on $L$-smoothness.
Primary Area: Optimization (convex and non-convex, discrete, stochastic, robust)
Submission Number: 3555
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview