This package contains PyTorch codes for the experiments on image classification in the paper:
Understanding AdamW through Proximal Methods and Scale-freeness


1. `src` folder contains codes for training a deep neural network to do image classification on CIFAR10/100. You can train models with the `main.py` script, with hyper-parameters being specified as flags (see --help for a detailed list and explanation).


### Reproducing Results

#### ResNet on CIFAR10

python main.py --optim-method AdamL2 --eta0 0.001 --weight-decay 0.0005 --train-epochs 300 --batchsize 128 --eval-interval 1 --use-cuda --log-folder ../logs/CIFAR10 --dataroot ../data --dataset CIFAR10 --model ResNet20 --no-batch-norm --scheduler None --store-stats --store-stats-interval 1000

python main.py --optim-method AdamW --eta0 0.001 --weight-decay 5e-05 --train-epochs 300 --batchsize 128 --eval-interval 1 --use-cuda --log-folder ../logs/CIFAR10 --dataroot ../data --dataset CIFAR10 --model ResNet20 --no-batch-norm --scheduler None --store-stats --store-stats-interval 1000

python main.py --optim-method AdamL2 --eta0 0.0005 --weight-decay 0.0005 --train-epochs 300 --batchsize 128 --eval-interval 1 --use-cuda --log-folder ../logs/CIFAR10 --dataroot ../data --dataset CIFAR10 --model ResNet44 --no-batch-norm --scheduler None --store-stats --store-stats-interval 1000

python main.py --optim-method AdamW --eta0 0.0005 --weight-decay 5e-05 --train-epochs 300 --batchsize 128 --eval-interval 1 --use-cuda --log-folder ../logs/CIFAR10 --dataroot ../data --dataset CIFAR10 --model ResNet44 --no-batch-norm --scheduler None --store-stats --store-stats-interval 1000

python main.py --optim-method AdamL2 --eta0 0.0005 --weight-decay 0.0005 --train-epochs 300 --batchsize 128 --eval-interval 1 --use-cuda --log-folder ../logs/CIFAR10 --dataroot ../data --dataset CIFAR10 --model ResNet56 --no-batch-norm --scheduler None --store-stats --store-stats-interval 1000

python main.py --optim-method AdamW --eta0 0.0005 --weight-decay 5e-05 --train-epochs 300 --batchsize 128 --eval-interval 1 --use-cuda --log-folder ../logs/CIFAR10 --dataroot ../data --dataset CIFAR10 --model ResNet56 --no-batch-norm --scheduler None --store-stats --store-stats-interval 1000

python main.py --optim-method AdamL2 --eta0 0.001 --weight-decay 0.0005 --train-epochs 300 --batchsize 128 --eval-interval 1 --use-cuda --log-folder ../logs/CIFAR10 --dataroot ../data --dataset CIFAR10 --model ResNet110 --no-batch-norm --scheduler None --store-stats --store-stats-interval 1000

python main.py --optim-method AdamW --eta0 0.0005 --weight-decay 0.0001 --train-epochs 300 --batchsize 128 --eval-interval 1 --use-cuda --log-folder ../logs/CIFAR10 --dataroot ../data --dataset CIFAR10 --model ResNet110 --no-batch-norm --scheduler None --store-stats --store-stats-interval 1000

python main.py --optim-method AdamL2 --eta0 0.0005 --weight-decay 0.005 --train-epochs 300 --batchsize 128 --eval-interval 1 --use-cuda --log-folder ../logs/CIFAR10 --dataroot ../data --dataset CIFAR10 --model ResNet218 --no-batch-norm --scheduler None --store-stats --store-stats-interval 1000

python main.py --optim-method AdamW --eta0 0.0005 --weight-decay 5e-05 --train-epochs 300 --batchsize 128 --eval-interval 1 --use-cuda --log-folder ../logs/CIFAR10 --dataroot ../data --dataset CIFAR10 --model ResNet218 --no-batch-norm --scheduler None --store-stats --store-stats-interval 1000


python main.py --optim-method AdamL2 --eta0 0.005 --weight-decay 0 --train-epochs 300 --batchsize 128 --eval-interval 1 --use-cuda --log-folder ../logs/CIFAR10 --dataroot ../data --dataset CIFAR10 --model ResNet110 --scheduler None --store-stats --store-stats-interval 1000

python main.py --optim-method AdamW --eta0 0.005 --weight-decay 0 --train-epochs 300 --batchsize 128 --eval-interval 1 --use-cuda --log-folder ../logs/CIFAR10 --dataroot ../data --dataset CIFAR10 --model ResNet110 --scheduler None --store-stats --store-stats-interval 1000




#### DenseNet-BC 100 Layer on CIFAR100

python main.py --optim-method AdamL2 --eta0 0.0005 --weight-decay 0.001 --train-epochs 300 --batchsize 128 --eval-interval 1 --use-cuda --log-folder ../logs/CIFAR100 --dataroot ../data --dataset CIFAR100 --model DenseNetBC100 --no-batch-norm --scheduler None --store-stats --store-stats-interval 1000

python main.py --optim-method AdamW --eta0 0.001 --weight-decay 5e-5 --train-epochs 300 --batchsize 128 --eval-interval 1 --use-cuda --log-folder ../logs/CIFAR100 --dataroot ../data --dataset CIFAR100 --model DenseNetBC100 --no-batch-norm --scheduler None --store-stats --store-stats-interval 1000


python main.py --optim-method AdamL2 --eta0 0.005 --weight-decay 0 --train-epochs 300 --batchsize 128 --eval-interval 1 --use-cuda --log-folder ../logs/CIFAR100 --dataroot ../data --dataset CIFAR100 --model DenseNetBC100 --scheduler None --store-stats --store-stats-interval 1000




#### Multiply the loss by a positive factor for checking scale-freeness.
python main.py --optim-method AdamL2 --eta0 0.001 --weight-decay 0.005 --train-epochs 300 --batchsize 128 --eval-interval 1 --use-cuda --log-folder ../logs/Loss_Mul --dataroot ../data --dataset CIFAR10 --model ResNet110 --no-batch-norm --scheduler None --store-stats --store-stats-interval 1000 --loss-multiplier 10

python main.py --optim-method AdamW --eta0 0.0005 --weight-decay 0.0001 --train-epochs 300 --batchsize 128 --eval-interval 1 --use-cuda --log-folder ../logs/Loss_Mul --dataroot ../data --dataset CIFAR10 --model ResNet110 --no-batch-norm --scheduler None --store-stats --store-stats-interval 1000 --loss-multiplier 10

python main.py --optim-method AdamL2 --eta0 0.0001 --weight-decay 0.1 --train-epochs 300 --batchsize 128 --eval-interval 1 --use-cuda --log-folder ../logs/Loss_Mul --dataroot ../data --dataset CIFAR10 --model ResNet110 --no-batch-norm --scheduler None --store-stats --store-stats-interval 1000 --loss-multiplier 100

python main.py --optim-method AdamW --eta0 0.0005 --weight-decay 0.0001 --train-epochs 300 --batchsize 128 --eval-interval 1 --use-cuda --log-folder ../logs/Loss_Mul --dataroot ../data --dataset CIFAR10 --model ResNet110 --no-batch-norm --scheduler None --store-stats --store-stats-interval 1000 --loss-multiplier 100




2. 'utils' folder contains codes for visualizing the results.

Note: all figures are saved in --fig-save-folder without showing, please go there to check.

draw_heatmap_tr_curve.py draws the heatmap of final test error of init_lr vs. weight_decay, the training loss and the test accuracy curve of the best setting for different optimizers, for example:
python draw_heatmap_tr_curve.py --fig-save-folder ../figs --log-folder ../logs/CIFAR10 --dataset CIFAR10 --model ResNet110 --no-batch-norm --scheduler None --train-epochs 300 --batchsize 128 --loss-multiplier 1 --weight-decay-vals 0 1e-05 5e-05 0.0001 0.0005 0.001 0.005 0.01 --eta0-vals 5e-05 0.0001 0.0005 0.001 0.005 --optim-methods AdamL2 AdamW


draw_histogram.py draws the histogram of the magnitude of gradients, updates, parameters, etc., of the model in an epoch, for example:
python draw_histogram.py --stats-file-path ../logs/CIFAR10/CIFAR10_ResNet110_NoBN_AdamW_Eta0_0.0005_WD_0.0001_Scheduler_None_Loss_Mul_1_Epoch_300_BatchSize_128_Test.pickle --fig-save-folder ../figs --bin-power-base 2 --max-bin-value 1 --epoch-index 150
