# A SPHERICAL ANALYSIS OF ADAM WITH BATCH NORMALIZATION
### Table of Content
- [Abstract](#abstract)
- [Setup](#setup)
- [Usage](#usage)
- [Reproduce the results](#reproduce-the-results)


## Abstract
Batch Normalization (BN) is a prominent deep learning technique. In spite of its apparent simplicity,its implications over optimization are yet to be fully understood. While previous studies mostly focuson the interaction between BN and SGD, we develop a geometric perspective which allows us toprecisely characterize the relation between BN and Adam.  More precisely we leverage the radialinvariance of groups of parameters, such as filters for convolutional neural networks, to translatethe optimization steps on the <sub>L2</sub> unit hypersphere. This formulation and the associated geometricinterpretation shed new light on the training dynamics. Firstly we use it to derive the first effectivelearning rate expression of Adam. Then we show that in the presence of BN layers, performing SGD alone is actually equivalent to a variant of Adam constrained to the unit hypersphere.  Finally ouranalysis outlines phenomena that previous variants of Adam act on and we experimentally validatetheir importance in the optimization process

This folder implements the variants of Adam  and gives a `train.py` script to reproduce the results presented in the paper.

## Setup
To use the package properly you need python3 and it is recommanded to use CUDA10 for acceleration. The Installation is as follow:

Install this folder and the dependencies using pip:
```bash
$ pip install -e custom_adam
```

With this, you can edit the code on the fly and import function and classes of the package in other project as well.

3. Remove Optional. To uninstall this package, run:
```bash
$ pip uninstall custom_adam
```

To import the package you just need:
```python
import custom_adam
```
The package contains pytorch `Optimizer` for the new variants proposed in the paper as well as classes to load classic models and dataset.
```
custom_adam.
    models.
        resnet18, resnet20, vgg16
    dataloaders.
        get_dataloader_cifar10, get_dataloader_cifar100, get_dataloader_SVHN
    optimizers.
        AdamA, AdamAB, AdamABC, AdamW, AdamG
```

## Usage
These optimizers are built to give a specific treatment on layers followed by BN (or other normalization layer). 
To use it with pytorch, you need to use paramgroups of pytorch (see the [doc](https://pytorch.org/docs/stable/optim.html#per-parameter-options)).
It allows you to specify the parameters followed by a normalization and activate the special treatment option `channel_wise=True` for these parameters.

The typical use for 2D convolutional networks where a convolutional layer is followed by a BN layer looks like:
```python
from custom_adam import AdamABC
par_groups = [{'params': model.conv_params(), 'channel_wise'=True},
              {'params': model.other_params()}]
optimizer = AdamABC(par_groups, lr=0.001, betas=(0.9, 0.99), weight_decay=1e-4)
optimizer.zero_grad()
loss_fn(model(input), target).backward()
optimizer.step()
```
Typical implementation of methods to filter the param for standard network as proposed by [torchvision models](https://pytorch.org/docs/stable/torchvision/models.html) are:
```python
class CustomModel(MyModel):
    ...

    def conv_params(self):
        conv_params = []
        for name, param in self.named_parameters():
            if any(key in name for key in {'conv', 'downsample.0'}):
                conv_params.append(param)
        return conv_params
```

## Reproduce the results
To reproduce the results of the paper, you can try all the proposed variants.
As in the paper, training can be done on public dataset CIFAR10, CIFAR100, SVHN and the architecture ResNet18, VGG16 and ResNet20 (only for CIFAR10).
For the training, the best parameters found in a previous grid search (cf Appendix E. Table 4) are used for the chosen setting. They are referenced in the file `best_hyper_parameters.py`.

The training is done over 400 epochs with a step-wise scheduling with 3 jumps.

To launch the training you just need to call `training.py` with the proper options
```bash
cd custom_adam
python training.py --optimizer=adamabc --model=resnet18 --dataloader=cifar100
```
Logs will give you the train and valid accuracy during the training as well as the test accuracy at the end of the training.

Options are :
```python
dataloader in ['cifar10', 'cifar100', 'svhn']
model in ['resnet18', 'resnet20', 'vgg16']
optimizer in ['adam', 'adama', 'adamab', 'adamabc', 'adamw', 'adamg']
```
`resnet20` is only available for `cifar10`.
