Keywords: Learned Optimizers, Optimization
Abstract: Recent works have demonstrated that learned optimizers (LOs) can be competitive and at times
outperform hand-designed counterparts, paving a path towards improved optimizers by scaling up
LOs. However, learned optimizers still require substantial meta-learning compute, which limits
their scalability, requiring new methods that allow them to generalize to a wider array of problems
from a smaller meta-learning problems. One aspect of this is the training horizon mismatch between
meta-learning and real world training. We consider the problem of efficiently meta-learning LOs
that can generalize to long training time horizons. We propose LoLO, which employs a replay
buffer to efficiently extend unroll length during meta-training without increasing meta-learning
cost. Furthermore, it incorporates on-policy imitation learning to ensure faithful trajectories and
stabilize meta-training. We evaluate LoLO on a variety of vision and language tasks, demonstrating
its success in achieving long unroll generalization in practical scenarios.
Submission Number: 139
Loading