The optimizers esgd, eadam, ALTO, and Lamb are located in the ./optimizers folder. 

If you want to compare with AdaBelief, please install adabelief_pytorch.

Datasets should be downloaded to the ./datasets folder.

For training with large batch sizes across all tasks, the corresponding batch size for each task is as follows:

- cifar10: 16384
- cifar100: 16384
- ImageNet-1k(resnet34): 4096
- BiLSTM for CONLL: 14987
- BERTs(fintune): 1024
- VGG-16: 16384
- DenseNet: 16384
- gpt2(pretrain): 4096
- ImageNet-1k(resnet50): 2k, 4k, 8k, 16k, 32k



Some important software packages:
- torch                     1.13.1+cu116 
- torchvision               0.14.1+cu116
- python                    3.8.18
- transformers              4.34.0
- scikit-learn              1.3.2 
- numpy                     1.24.4 
- pandas                    1.2.4 
- tqdm                      4.66.1 
- matplotlib                3.5.1 


All experiments were completed on a computing node with 4 A100 GPUs, each having 80GB of VRAM. Therefore, the source code does not implement multi-node parallelism.

For 2d typical optimization test function experiments (gif), use 2d_test please.