Bandwidth-based Step-Sizes for Non-Convex  Stochastic Optimization

Xiaoyu Wang; Mikael Johansson

Bandwidth-based Step-Sizes for Non-Convex Stochastic Optimization

Xiaoyu Wang, Mikael Johansson

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone

Keywords: Stochastic gradient descent, bandwidth-based step size, non-asymptotic analysis

Abstract: Many popular learning-rate schedules for deep neural networks combine a decaying trend with local perturbations that attempt to escape saddle points and bad local minima. We derive convergence guarantees for bandwidth-based step-sizes, a general class of learning-rates that are allowed to vary in a banded region. This framework includes many popular cyclic and non-monotonic step-sizes for which no theoretical guarantees were previously known. We provide worst-case guarantees for SGD on smooth non-convex problems under several bandwidth-based step sizes, including stagewise $1/\sqrt{t}$ and the popular \emph{step-decay} (``constant and then drop by a constant’’), which is also shown to be optimal. Moreover, we show that its momentum variant converges as fast as SGD with the bandwidth-based step-decay step-size. Finally, we propose novel step-size schemes in the bandwidth-based family and verify their efficiency on several deep neural network training tasks.

One-sentence Summary: We derive convergence guarantees for bandwidth-based step-sizes, a general class of learning-rates that are allowed to vary in a banded region.

Supplementary Material: zip

5 Replies

Loading