Scheduling the Learning Rate Via Hypergradients: New Insights and a New AlgorithmDownload PDF

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone
Keywords: automl, hyperparameter optimization, learning rate, deep learning
TL;DR: MARTHE: a new method to fit task-specific learning rate schedules from the perspective of hyperparameter optimization
Abstract: We study the problem of fitting task-specific learning rate schedules from the perspective of hyperparameter optimization. This allows us to explicitly search for schedules that achieve good generalization. We describe the structure of the gradient of a validation error w.r.t. the learning rates, the hypergradient, and based on this we introduce a novel online algorithm. Our method adaptively interpolates between two recently proposed techniques (Franceschi et al., 2017; Baydin et al.,2018), featuring increased stability and faster convergence. We show empirically that the proposed technique compares favorably with baselines and related methodsin terms of final test accuracy.
Original Pdf: pdf
9 Replies

Loading