### requirements
```shell
torch>=2.0
nvidia-dali>=1.27.0
```


### training
```shell
python run_train.py


# This is the training time using 8 A100s. We used 80 A100s for our training.
# rank-id:0:rank 155.76 total 1246.06 its/s lr: 0.000004 step: 28 required: 2856 hours scale: 512 
# name: LAION_0       epoch: 0            lr: 0.0000          loss: 12.12         
# rank-id:0:rank 155.61 total 1244.88 its/s lr: 0.000004 step: 30 required: 2851 hours scale: 512 
# name: LAION_0       epoch: 0            lr: 0.0000          loss: 12.11         
# rank-id:0:rank 155.71 total 1245.72 its/s lr: 0.000004 step: 32 required: 2847 hours scale: 512 
# name: LAION_0       epoch: 0            lr: 0.0000          loss: 12.10         
# rank-id:0:rank 155.74 total 1245.95 its/s lr: 0.000005 step: 34 required: 2843 hours scale: 512 
```

### distributed training
You need to modify the IP address in `run_train`, and ensure that each server can be accessed without requiring a password.

```shell
# for example
ip_list = [
    "172.16.9.10",
    "172.16.9.11",
    "172.16.9.12",
    "172.16.9.13",
    "172.16.9.14",
    "172.16.9.15",
    "172.16.9.16",
    "172.16.9.17",
    "172.16.9.18",
    "172.16.9.19",
]
port = 39999
```
