Learning to Optimize with Recurrent Hierarchical Transformers

Abhinav Moudgil; Boris Knyazev; Guillaume Lajoie; Eugene Belilovsky

Learning to Optimize with Recurrent Hierarchical Transformers

Abhinav Moudgil, Boris Knyazev, Guillaume Lajoie, Eugene Belilovsky

Published: 19 Jun 2023, Last Modified: 09 Jul 2023Frontiers4LCDEveryoneRevisionsBibTeX

Keywords: learned optimization, transformers, meta-learning

TL;DR: We propose an efficient first-of-its-kind learned optimizer based on transformer with recurrence which leverages the structure of neural networks to perform optimization

Abstract:

Learning to optimize (L2O) has received a lot of attention recently because of its potential to leverage data to outperform hand-designed optimization algorithms such as Adam. However, they can suffer from high meta-training costs and memory overhead. Recent attempts have been made to reduce the computational costs of these learned optimizers by introducing a hierarchy that enables them to perform most of the heavy computation at the tensor (layer) level rather than the parameter level. This not only leads to sublinear memory cost with respect to number of parameters, but also allows for a higher representation capacity for efficient learned optimization. To this end, we propose an efficient transformer-based learned optimizer which facilitates communication among tensors with self-attention and keeps track of optimization history with recurrence. We show that our optimizer converges faster than strong baselines at a comparable memory overhead, thereby suggesting encouraging scaling trends.

Submission Number: 98

Loading