Non-greedy Gradient-based Hyperparameter Optimization Over Long Horizons

Paul Micaelli; Amos Storkey

Non-greedy Gradient-based Hyperparameter Optimization Over Long Horizons

Paul Micaelli, Amos Storkey

28 Sept 2020 (modified: 22 Jun 2025)ICLR 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Hyperparameter optimization, Meta-learning

Abstract: Gradient-based meta-learning has earned a widespread popularity in few-shot learning, but remains broadly impractical for tasks with long horizons (many gradient steps), due to memory scaling and gradient degradation issues. A common workaround is to learn meta-parameters online, but this introduces greediness which comes with a significant performance drop. In this work, we enable non-greedy meta-learning of hyperparameters over long horizons by sharing hyperparameters that are contiguous in time, and using the sign of hypergradients rather than their magnitude to indicate convergence. We implement this with forward-mode differentiation, which we extend to the popular momentum-based SGD optimizer. We demonstrate that the hyperparameters of this optimizer can be learned non-greedily without gradient degradation over $\sim 10^4$ inner gradient steps, by only requiring $\sim 10$ outer gradient steps. On CIFAR-10, we outperform greedy and random search methods for the same computational budget by nearly $10\%$. Code will be available upon publication.

One-sentence Summary: Learning hyperparameter over long horizons without greediness

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/non-greedy-gradient-based-hyperparameter/code)

Reviewed Version (pdf): https://openreview.net/references/pdf?id=yFMg9lw8Xn

12 Replies

Loading