Softmax
```
CUDA_VISIBLE_DEVICES='0,1,2,3' python -m torch.distributed.launch --master_port 25269 --nproc_per_node=4 --use_env main.py --model deit_tiny_patch16_224 --batch-size 256 --data-path 'dataset/imagenet' --output_dir '/model_output/deit_tiny_patch16_224'
```
MRA-2-s
```
CUDA_VISIBLE_DEVICES='0,1,2,3' python -m torch.distributed.launch --master_port 25269 --nproc_per_node=4 --use_env main.py --model deit_tiny_MRA2_full_patch16_224 --batch-size 256 --data-path 'dataset/imagenet' --output_dir '/model_output/deit_tiny_MRA2_full_patch16_224'
```
MrsFormer
```
CUDA_VISIBLE_DEVICES='0,1,2,3' python -m torch.distributed.launch --master_port 25269 --nproc_per_node=4 --use_env main.py --model deit_tiny_MrsFormer_patch16_224 --batch-size 256 --data-path 'dataset/imagenet' --output_dir '/model_output/deit_tiny_MrsFormer_patch16_224'
```