# Training recipes 

We provide the specific commonds and hyper-parameters for ViTs, ResNets and ConvNexts in this recipe.



## Training of ViT

### 1) Training with Setting I

This is a prevalent setting for training [ResNets](https://arxiv.org/abs/2110.00476). To train ViT-small, you can use the following command.

```python
python -m torch.distributed.launch --nproc_per_node=8 ./train.py 
    --data-dir ${IMAGENET_DIR}   \
    --model deit_small_patch16_224 \
    --sched cosine -j 10 \
    --epochs ${EPOCH} --weight-decay 0.02 \
    --opt Adan \ 
    --lr 1.5e-2  --opt-betas 0.98 0.92 0.99 \
    --opt-eps 1e-8 --max-grad-norm 0.0 \
    --warmup-lr 1e-8 --min-lr 1.0e-08 \
    -b 256 --amp \
    --aug-repeats 0 \
    --warmup-epochs 60 \
    --aa rand-m7-mstd0.5-inc1 \
    --smoothing 0.1 \
    --remode pixel \
    --reprob 0.0 \
    --bce \
    --drop 0.0 --drop-path 0.05 \
    --mixup 0.2 --cutmix 1.0 \
    --output ${OUT_DIR} \
    --experiment ${EXP_DIR}
```

After training, this command should give the following results. Note, it seems that this setting cannot  improve the results of ViT-Base under training setting II (see below).

|           |                          150 Epoch                           |                          300 Epoch                           |
| :-------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
| ViT small |                             80.1                             |                             81.1                             |
| download  | [config](./exp_results/ViT/small/args_vit-s_150-I.yaml)/[log](./exp_results/ViT/small/summary_vit-s_150-I.csv)/model | [config](./exp_results/ViT/small/args_vit-s_300-I.yaml)/[log](./exp_results/ViT/small/summary_vit-s_300-I.csv)/model |





### 2) Training with Setting II

This is the official setting used in [Deit](https://github.com/facebookresearch/deit). Note, without distillation, DeiTs and ViTs are the same models. To train ViT-small, you can use the following command.

```python
python -m torch.distributed.launch --nproc_per_node=8 ./train.py 
    --data-dir ${IMAGENET_DIR} \
    --model ${MODEL_NAME} \
    --sched cosine -j 10 \
    --epochs ${EPOCH} --weight-decay .02 \
    --opt Adan \ 
    --lr 1.5e-2  --opt-betas 0.98 0.92 0.99 \
    --opt-eps 1e-8 --max-grad-norm 5.0 \
    --warmup-lr 1e-8 --min-lr 1e-5 \
    -b 256 --amp \
    --aug-repeats ${REP} \
    --warmup-epochs 60 \
    --aa ${AUG}  \
    --smoothing 0.1 \
    --remode pixel \
    --reprob 0.25 \
    --drop 0.0 --drop-path ${Dp} \
    --mixup 0.8 --cutmix 1.0 \
    --output ${OUT_DIR} \
    --experiment ${EXP_DIR}
```
There is some differences between hyper-parameters for ViT-Base and ViT-Small. `--bce` means using the Binary Cross Entropy loss. 

|           |       MODEL_NAME       | REP  |         AUG          |  BCE  | Bias-Decay | Drop-path |
| --------- | :--------------------: | :--: | :------------------: | :---: | :--------: | :-------: |
| ViT-Small | deit_small_patch16_224 |  0   | rand-m7-mstd0.5-inc1 | True  |   False    |    0.1    |
| ViT-Base  | deit_base_patch16_224  |  3   | rand-m9-mstd0.5-inc1 | False |    True    |    0.2    |

After training, you should expect the following results. **Note that ViT-Base (300 epoch) is trained by the faster version of Adan (foreach=True)**. For more details and settings, please refer to the corresponding configure files.

|           |                          150 Epoch                           |                          300 Epoch                           |
| :-------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
| ViT-Small |                             79.6                             |                             80.9                             |
| download  | [config](./exp_results/ViT/small/args_vit-s_150.yaml)/[log](./exp_results/ViT/small/summary_vit-s_150.csv)/model | [config](./exp_results/ViT/small/args_vit-s_300.yaml)/[log](./exp_results/ViT/small/summary_vit-s_300.csv)/model |
| ViT-Base  |                             81.7                             |                             82.6                             |
| download  | [config](./exp_results/ViT/base/args_vit-B_150.yaml)/[log](./exp_results/ViT/base/summary_vit-B_150.csv)/model | [config](./exp_results/ViT/base/args_vit-B_300_T.yaml)/[log](./exp_results/ViT/base/summary_vit-B_300_T.csv)/model |



## ResNet-50
This is a default setting used to train [ResNets](https://arxiv.org/abs/2110.00476). To train ResNet-50, you can use the following command.

```python
python -m torch.distributed.launch --nproc_per_node=8 ./train.py 
    --data-dir ${IMAGENET_DIR} \
    --model resnet50 \
    --sched cosine -j 8 \
    --epochs ${EPOCH} --weight-decay .02 \
    --opt Adan \ 
    --lr ${LR}  --opt-betas 0.98 0.92 0.99 \
    --opt-eps 1e-8 --max-grad-norm 5.0 \
    --warmup-lr 1e-9 --min-lr 1.0e-05 --bias-decay \
    -b 256 --amp \
    --aug-repeats 0 \
    --warmup-epochs 60 \
    --aa rand-m7-mstd0.5-inc1 \
    --smoothing 0.0 \
    --remode pixel \
    --crop-pct 0.95 \
    --reprob 0.0 \
    --bce \
    --drop 0.0 --drop-path 0.05 \
    --mixup 0.1 --cutmix 1.0 \
    --output ${OUT_DIR} \
    --experiment ${EXP_DIR}
```

When training different epochs, we use slightly different learning rate, namely, `LR = 3e-2` for `EPOCH = 100` and `LR = 1.5e-2` for `EPOCH = 200 and 300`. After training, you can get the following resutls:

|           |                          100 Epoch                           |                          200 Epoch                           |                          300 Epoch                           |
| :-------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
| ResNet-50 |                             78.1                             |                             79.7                             |                             80.2                             |
| download  | [config](./exp_results/ResNet/Res50/args_res50_100.yaml)/[log](./exp_results/ResNet/Res50/summary_res50_100.csv)/model | [config](./exp_results/ResNet/Res50/args_res50_200.yaml)/[log](./exp_results/ResNet/Res50/summary_res50_200.csv)/model | [config](./exp_results/ResNet/Res50/args_res50_300.yaml)/[log](./exp_results/ResNet/Res50/summary_res50_300.csv)/model |



## ResNet-101

To train ResNet-101, you may use the following command.

```python
python -m torch.distributed.launch --nproc_per_node=8 train.py \ 
    --data-dir ${IMAGENET_DIR} \
    --model resnet101 \
    --sched cosine -j 8 \
    --epochs 300 --weight-decay .02 \
    --lr 1.5e-2  --warmup-lr 1e-9 --min-lr 1.0e-05 \
    -b 256 --amp --opt adan --opt-betas 0.98 0.92 0.99 --opt-eps 1e-8 \
    --max-grad-norm 5 \
    --bias-decay \
    --aug-repeats 0 \
    --warmup-epochs 90 \
    --aa rand-m7-mstd0.5-inc1 \
    --smoothing 0.0 \
    --remode pixel \ 
    --bce-loss \
    --crop-pct 0.95 \
    --reprob 0.0 \
    --drop 0.0 --drop-path 0.2 \
    --mixup 0.1 --cutmix 1.0 \
    --output ${OUT_DIR} \
    --experiment ${EXP_DIR}

```

We use slightly different learning rate, namely, `LR = 1e-2` for `EPOCH = 100` and `LR = 1.5e-2` for `EPOCH = 200` and` 300`. For more detailed training settings, please refer to the following configuration files. **Note that the results for 100 and 300 epochs are obtained by the faster version Adan (foreach=True).**

|            |                          100 Epoch                           |                          200 Epoch                           |                          300 Epoch                           |
| :--------: | :----------------------------------------------------------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
| ResNet-101 |                             80.0                             |                             81.6                             |                             81.9                             |
|  download  | [config](./exp_results/ResNet/Res101/args_res101_100.yaml)/[log](./exp_results/ResNet/Res101/summary_res101_100.csv)/model | [config](./exp_results/ResNet/Res101/args_res101_200.yaml)/[log](./exp_results/ResNet/Res101/summary_res101_200.csv)/model | [config](./exp_results/ResNet/Res101/args_res101_300.yaml)/[log](./exp_results/ResNet/Res101/summary_res101_300.csv)/model |



## ConvNext

This is a default setting to train ConvNext-tiny. To train ConvNext-tiny, you can use the following command.

```python
python -m torch.distributed.launch --nproc_per_node=8 ./train.py 
    --data-dir ${IMAGENET_DIR} \
    --model convnext_tiny_hnf \
    --sched cosine -j 8 \
    --epochs ${EPOCH} --weight-decay .02 \
    --opt Adan \ 
    --lr 1.6e-2  --opt-betas 0.98 0.92 0.90 \
    --opt-eps 1e-8 --max-grad-norm 0.0 \
    --warmup-lr 1e-9 --min-lr 1.0e-05 --bias-decay \
    -b 256 --amp \
    --aug-repeats 0 \
    --warmup-epochs 150 \
    --aa rand-m7-mstd0.5-inc1 \
    --smoothing 0.1 \
    --remode pixel \
    --reprob 0.25 \
    --drop 0.0 --drop-path 0.1 \
    --mixup 0.8 --cutmix 1.0 \
    --model-ema \
    --train-interpolation random \
    --output ${OUT_DIR} \
    --experiment ${EXP_DIR}
```

For this training, the performance is NOT sensitive to some hyper-params, such as `warmup-epochs` and `lr`.  But whether using `model-ema` plays a key role. 

You can use the following config to train convnext tiny for 150 epoch, in which we do not utilize `model-ema`.

This results should be:

|               |                          150 Epoch                           |                          300 Epoch                           |
| :-----------: | :----------------------------------------------------------: | :----------------------------------------------------------: |
| ConvNext-tiny |                             81.7                             |                             82.4                             |
|   download    | [config](./exp_results/ConvNext/small/args_cvnext_150.yaml)/[log](./exp_results/ConvNext/small/summary_cvnext_150.csv)/model | [config](./exp_results/ConvNext/small/args_cvnext_300.yaml)/[log](./exp_results/ConvNext/small/summary_cvnext_300.csv)/model |

