Non-greedy Gradient-based Hyperparameter Optimization Over Long HorizonsDownload PDF

28 Sept 2020 (modified: 22 Oct 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: Hyperparameter optimization, Meta-learning
Abstract: Gradient-based meta-learning has earned a widespread popularity in few-shot learning, but remains broadly impractical for tasks with long horizons (many gradient steps), due to memory scaling and gradient degradation issues. A common workaround is to learn meta-parameters online, but this introduces greediness which comes with a significant performance drop. In this work, we enable non-greedy meta-learning of hyperparameters over long horizons by sharing hyperparameters that are contiguous in time, and using the sign of hypergradients rather than their magnitude to indicate convergence. We implement this with forward-mode differentiation, which we extend to the popular momentum-based SGD optimizer. We demonstrate that the hyperparameters of this optimizer can be learned non-greedily without gradient degradation over $\sim 10^4$ inner gradient steps, by only requiring $\sim 10$ outer gradient steps. On CIFAR-10, we outperform greedy and random search methods for the same computational budget by nearly $10\%$. Code will be available upon publication.
One-sentence Summary: Learning hyperparameter over long horizons without greediness
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Supplementary Material: zip
Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/arxiv:2007.07869/code)
Reviewed Version (pdf): https://openreview.net/references/pdf?id=yFMg9lw8Xn
12 Replies

Loading