TL;DR: We present a method to automatically tune learning rate while training DNNs, and achieve or beat generalization accuracy of SOTA learning rates schedules for ImageNet (Resnet-50), Cifar-10 (Resnet-18), IWSLT (Transformer), Squad (Bert)
Abstract: One very important hyperparameter for training deep neural networks is the
learning rate of the optimizer. The choice of learning rate schedule determines
the computational cost of getting close to a minima, how close you actually get
to the minima, and most importantly the kind of local minima (wide/narrow)
attained. The kind of minima attained has a significant impact on the
generalization accuracy of the network. Current systems employ hand tuned
learning rate schedules, which are painstakingly tuned for each network and
dataset. Given that the state space of schedules is huge, finding a
satisfactory learning rate schedule can be very time consuming. In this paper,
we present AutoLR, a method for auto-tuning the learning rate as training
proceeds. Our method works with any optimizer, and we demonstrate results on
SGD, Momentum, and Adam optimizers.
We extensively evaluate AutoLR on multiple datasets, models, and across
multiple optimizers. We compare favorably against state of the art learning
rate schedules for the given dataset and models, including for ImageNet on
Resnet-50, Cifar-10 on Resnet-18, and SQuAD fine-tuning on BERT. For example,
AutoLR achieves an EM score of 81.2 on SQuAD v1.1 with BERT_BASE compared to
80.8 reported in (Devlin et al. (2018)) by just auto-tuning the learning rate
schedule. To the best of our knowledge, this is the first automatic learning
rate tuning scheme to achieve state of the art generalization accuracy on these
datasets with the given models.
Keywords: Automatic Learning Rate, Deep Learning, Generalization, Stochastic Optimization
Original Pdf: pdf
15 Replies
Loading