Learning rate optimization through step samplingDownload PDF

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Learning Rate Optimization, Hyper-parameter tuning, LR Search, Training Efficiency
Abstract: Modern machine learning models require selecting hyper-parameters prior to training; important variables that define the way in which the model can learn, but which cannot be learned by the model itself and instead need to be assigned in advance. Of the hyper-parameters that must be selected when configuring a model, arguably the most important is the “learning rate” of the model, the step size that the model uses when learning its parameters with gradient descent. Here we propose a method to deliberately select a learning rate by training a model for a small number of steps over a variety of learning rates and resetting both the model parameters and dataset between each trial. A curve of the log of those rates vs the losses achieved for each is used to select a viable range for an optimal learning rate, and we compare several methods of selecting an optimal point within that range. The performance of the selections from these methods are then evaluated using a full grid search, and in our experiments, they reliably select learning rates that achieve a good accuracy for any given model.
One-sentence Summary: An efficient method for selecting an optimal learning rate by repeatedly training a model for a few steps at different learning rates and analyzing the resulting loss curve.
9 Replies

Loading