Scheduling the Learning Rate Via Hypergradients: New Insights and a New AlgorithmDownload PDF

25 Sept 2019 (modified: 23 Mar 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone
Keywords: automl, hyperparameter optimization, learning rate, deep learning
TL;DR: MARTHE: a new method to fit task-specific learning rate schedules from the perspective of hyperparameter optimization
Abstract: We study the problem of fitting task-specific learning rate schedules from the perspective of hyperparameter optimization. This allows us to explicitly search for schedules that achieve good generalization. We describe the structure of the gradient of a validation error w.r.t. the learning rates, the hypergradient, and based on this we introduce a novel online algorithm. Our method adaptively interpolates between two recently proposed techniques (Franceschi et al., 2017; Baydin et al.,2018), featuring increased stability and faster convergence. We show empirically that the proposed technique compares favorably with baselines and related methodsin terms of final test accuracy.
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/scheduling-the-learning-rate-via/code)
Original Pdf: pdf
9 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview