### Distributed training

For models of size > 30B, Deepspeed Zero 3 requires multi-node training (2 nodes).

The examples here pertain to 4 machines, with address i201, i202, i203, i204.