Exact linear-rate gradient descent: optimal adaptive stepsize theory and practical use

Yifan Ran

Exact linear-rate gradient descent: optimal adaptive stepsize theory and practical use

Yifan Ran

23 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: gradient descent, adaptive stepsize/learning rate, universal optimal choice, exact convergence rate

TL;DR: We established a general adaptive stepsize theory for gradient descent, including feasible selection range, optimal choice, and convergence rate.

Abstract: Consider gradient descent iterations $ {x}^{k+1} = {x}^k - \alpha_k \nabla f ({x}^k) $. Suppose gradient exists and $ \nabla f ({x}^k) \neq {0}$. We propose the following closed-form stepsize choice: \begin{equation} \alpha_k^\star = \frac{ \Vert {x}^\star - {x}^k \Vert }{\left\Vert \nabla f({x}^k) \right\Vert} \cos\eta_k , \tag{theoretical} \end{equation} where $ \eta_k $ is the angle between vectors $ {x}^\star - {x}^k $ and $ -\nabla f({x}^k) $. It is universally applicable and admits an exact linear convergence rate with factor $ \sin^2\eta_k $. Moreover, if $ f $ is convex and $ L $-smooth, then $ \alpha_k^\star \geq {1}/{L} $. For practical use, we approximate (can be exact) the above via \begin{equation} \alpha_{k}^\dagger = \gamma_0 \cdot \frac{ f({x}^k) - \bar{f}_0 }{\Vert \nabla f ( {x}^k ) \Vert^2 } , \tag{practical use} \end{equation} where $\gamma_0 $ is a tunable parameter; $ \bar{f}_0 $ is a guess on the smallest objective value (can be auto. updated). Suppose $ f $ is convex and $ \bar{f}_0 = f ( {x}^\star ) $, then any choice from $\gamma_0 \in (0,2] $ guarantees an exact linear-rate convergence to the optimal point. We consider a few examples. (i) An $ \mathbb{R}^2 $ quadratic program, where a well-known ill-conditioning bottleneck is addressed, with a rate strictly better than $ O(1/2^k) $. (ii) A geometric program, where an inaccurate guess $ \bar{f}_0 $ remains powerful. (iii) A non-convex MNIST classification problem via neural networks, where preliminary tests show that ours admits better performance than the state-of-the-art algorithms, particularly a tune-free version is available in some settings.

Supplementary Material: pdf

Primary Area: optimization

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2962

Loading