TL;DR: Using Armijo line-search can lead to exponential improvements in the gradient descent convergence rate for problems such as logistic regression and policy optimization in reinforcement learning.
Abstract: Armijo line-search (Armijo-LS) is a standard method to set the step-size for gradient descent (GD). For smooth functions, Armijo-LS alleviates the need to know the global smoothness constant $L$ and adapts to the ``local'' smoothness, enabling GD to converge faster. Existing theoretical analyses show that GD with Armijo-LS ($\texttt{GD-LS}$) can result in constant factor improvements over GD with a $1/L$ step-size (denoted as $\texttt{GD(1/L)}$). We strengthen these results and show that if the objective function satisfies a certain non-uniform smoothness condition, $\texttt{GD-LS}$ can result in a faster convergence rate than $\texttt{GD(1/L)}$. In particular, we prove that for convex objectives corresponding to logistic regression and multi-class classification, $\texttt{GD-LS}$ can converge to the optimum at a linear rate, and hence improves over the sublinear convergence of $\texttt{GD(1/L)}$. Furthermore, for non-convex objectives satisfying gradient domination (e.g., those corresponding to the softmax policy gradient in RL or generalized linear models with a logistic link function), $\texttt{GD-LS}$ can match the fast convergence of algorithms tailored for these specific settings. Finally, we prove that under the interpolation assumption, for convex losses, stochastic GD with a stochastic line-search can match the fast convergence of $\texttt{GD-LS}$.
Lay Summary: Gradient descent (GD) is the standard optimization method for training machine learning (ML) models. The performance of GD is sensitive to the choice of its step-size parameter. Armijo line-search is a common technique to "search" for a good step-size in each GD step. Armijo line-search does not only make GD more robust, but also makes the method faster in practice. In this paper, we theoretically characterize how fast can it go?
We show that for common ML objectives such as logistic regression, GD with Armijo line-search (GDLS) can be exponentially faster than using GD with a fixed, pre-determined step-size. Moreover, for specific problems in supervised learning and reinforcement learning, we prove that GDLS can theoretically match or outperform algorithms explicitly designed for these problems.
Our results thus demonstrate the universal effectiveness of GDLS, and show that this classic algorithm is all you need!
Primary Area: Optimization
Keywords: Armijo line-search, (Stochastic) Gradient descent, Convergence rates, Logistic regression, Policy optimization, Generalized linear models, Interpolation
Submission Number: 7865
Loading