# VeLO: Training Versatile Learned Optimizers by Scaling Up

<a href="https://arxiv.org/abs/2211.09760"><img src="https://img.shields.io/badge/arXiv-2211.09760-00ff00.svg" height=20></a>
<a href="https://colab.research.google.com/github/google/learned_optimization/blob/main/learned_optimization/research/general_lopt/Demo_for_training_a_model_with_a_learned_optimizer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

> While deep learning models have replaced hand-designed features across many domains,
these models are still trained with hand-designed optimizers. In this work, we leverage the same
scaling approach behind the success of deep learning to learn versatile optimizers. We train an
optimizer for deep learning which is itself a small neural network that ingests gradients and
outputs parameter updates. Meta-trained with approximately four thousand TPU-months of
compute on a wide variety of optimization tasks, our optimizer not only exhibits compelling
performance, but optimizes in interesting and unexpected ways. It requires no hyperparameter
tuning, instead automatically adapting to the specifics of the problem being optimized. 
 
## What is VeLO?
VeLO is a learned optimizer: instead of updating parameters with SGD or Adam, we update them using a learning rule that was meta-learned on thousands of deep learning tasks. The architecture of VeLO consists of a LSTM to aggregate information from each tensor in the deep network being optimized, and per-parameter MLPs that produce the update rule for each parameter with weights generated by the LSTM. For more info, see our [paper](https://arxiv.org/abs/2211.09760).
 
 <img src="https://raw.githubusercontent.com/velo-code/velo-code.github.io/main/velo_schematic.png" alt="Schematic of VeLO" width="800"/>

## What can VeLO do?
 VeLO has no hyperparameters and works out of the box to solve several large-scale real-world problems! Here is the performance of VeLO on the ML Commons Benchmark:
  
<img src="https://raw.githubusercontent.com/velo-code/velo-code.github.io/main/velo_mlcommons.png" alt="Performance of VeLO on ML Commons Benchmark" width="800"/>
 
## Try out VeLO!
You can use VeLO now, from our [learned_optimization](https://github.com/google/learned_optimization) package. [This colab](https://colab.research.google.com/github/google/learned_optimization/blob/main/learned_optimization/research/general_lopt/Demo_for_training_a_model_with_a_learned_optimizer.ipynb) demonstrates how to load VeLO and use it on some common tasks.  

If you'd like to train your own optimizer, or perform further research on learned optimization, we have a series of [tutorial notebooks](https://github.com/google/learned_optimization#learned_optimization-tutorial-sequence) in our repo with more pedagogical details.

## Citing VeLO

If you use or build upon VeLO, please cite the original paper:

```
@article{metz2022velo,
  title={{VeLO}: Training Versatile Learned Optimizers by Scaling Up},
  author={Luke Metz, James Harrison, C. Daniel Freeman, Amil Merchant, Lucas Beyer, James Bradbury, Naman Agrawal, Ben Poole, Igor Mordatch, Adam Roberts, Jascha Sohl-Dickstein},
  journal = {arXiv preprint arXiv:2211.09760},
  year = {2022},
  url = {velo-code.github.io},
}
```