Abstract: Well-performing deep learning models have enormous impact, but getting them
to perform well is complicated, as the model architecture must be chosen and a
number of hyperparameters tuned. This requires experimentation, which is timeconsuming and costly. We propose to address the problem of hyperparameter
tuning by learning to forecast the training behaviour of deep learning architectures.
Concretely, we introduce a forecasting model that, given a hyperparameter schedule
(e.g., learning rate, weight decay) and a history of training observations (such as
loss and accuracy), predicts how the training will continue. Naturally, forecasting
is much faster and less expensive than running actual deep learning experiments.
The main question we study is whether the forecasting model is good enough to be
of use - can it indeed replace real experiments? We answer this affirmatively in two
ways. For one, we show that the forecasted curves are close to real ones. On the
practical side, we apply our forecaster to learn hyperparameter tuning policies. We
experiment on a version of ResNet on CIFAR10 and on Transformer in a language
modeling task. The policies learned using our forecaster match or exceed the ones
learned in real experiments and in one case even the default schedules discovered
by researchers. We study the learning rate schedules created using the forecaster
are find that they are not only effective, but also lead to interesting insights.
Original Pdf: pdf
10 Replies
Loading