Scheduling the Learning Rate Via Hypergradients: New Insights and a New Algorithm

Anonymous

Sep 25, 2019 ICLR 2020 Conference Blind Submission readers: everyone Show Bibtex
  • Keywords: automl, hyperparameter optimization, learning rate, deep learning
  • TL;DR: MARTHE: a new method to fit task-specific learning rate schedules from the perspective of hyperparameter optimization
  • Abstract: We study the problem of fitting task-specific learning rate schedules from the perspective of hyperparameter optimization. This allows us to explicitly search for schedules that achieve good generalization. We describe the structure of the gradient of a validation error w.r.t. the learning rates, the hypergradient, and based on this we introduce a novel online algorithm. Our method adaptively interpolates between two recently proposed techniques (Franceschi et al., 2017; Baydin et al.,2018), featuring increased stability and faster convergence. We show empirically that the proposed technique compares favorably with baselines and related methodsin terms of final test accuracy.
0 Replies

Loading