A Second look at Exponential and Cosine Step Sizes: Simplicity, Adaptivity, and Performance

Xiaoyu Li, Zhenxun Zhuang, Francesco Orabona

Published: 01 Jan 2021, Last Modified: 28 Apr 2023ICML 2021Readers: Everyone

Abstract: Stochastic Gradient Descent (SGD) is a popular tool in training large-scale machine learning models. Its performance, however, is highly variable, depending crucially on the choice of the step size...

0 Replies