
# Lions-and-Muons-Optimization-via-Stochastic-Frank-Wolfe

This repository provides guidelines for reproducing the experiments from the paper "Lions and Muons: Optimization via Stochastic Frank-Wolfe".

## Prerequisites

- ```pytorch```
- ```numpy```

## nanoGPT

First, download the Shakespeare dataset as a single (1MB) file and turn it from raw text into one large stream of integers:

```python
python data/shakespeare_char/prepare.py
```
This creates a ```train.bin``` and ```val.bin``` in that data directory. To start training the nanoGPT with Lion run:

```python
torchrun --standalone --nproc_per_node=1 train_lion.py config/train_shakespeare_char.py
```

Similarly, to train the nanoGPT with Muon run:
```python
torchrun --standalone --nproc_per_node=1 train_muon.py config/train_shakespeare_char.py
```
All relevant hyperparameters are specified in the configuration file. The clipping can be manually turned on and off from ```train_lion.py``` and ```train_muon.py```.

## Synthetic experiment

To run the experiment on the synthetic function with Lion run:

```python
python main_synthetic_Lion.py
```
The optimizer (Lion or Lion with variance reduction), gradient clipping and hyperparameter settings can be specified in the file ```python main_synthetic_Lion.py```.

Similarly, to run the experiment on the synthetic function with Muon run:
```python
python main_synthetic_Muon.py
```
The optimizer (Muon or Muon with variance reduction), gradient clipping and hyperparameter settings can be specified in the file ```python main_synthetic_Muon.py```.


## ResNet18

To train ResNet18 on CIFAR10 with Lion run:
```python
python train_lion.py --lr 1e-4 --wd 1e-2
```
Similarly, for Lion++ run:
```python
python train_lion_VR_clip.py --lr 1e-4 --wd 1e-3
```
To train ResNet18 on CIFAR10 with Muon run:
```python
python train_muon.py --lr 5e-2 --wd 1e-3
```
Similarly, for Muon++ run:
```python
python train_muon_VR_clip.py --lr 5e-2 --wd 1e-3
```
The learning rate and weight decay can be specified from the command line with the arguments ```--lr``` and ```--wd```. The gradient clipping and additional hyperparameters can be specified in the corresponding train files.
