A Second look at Exponential and Cosine Step Sizes: Simplicity, Adaptivity, and PerformanceDownload PDFOpen Website

Published: 01 Jan 2021, Last Modified: 28 Apr 2023ICML 2021Readers: Everyone
Abstract: Stochastic Gradient Descent (SGD) is a popular tool in training large-scale machine learning models. Its performance, however, is highly variable, depending crucially on the choice of the step size...
0 Replies

Loading