This is the implementation for the submission titled "AsyncMesh: Fully Asynchronous Optimization for Data and Pipeline Parallelism".

## Instructions

First install `requirements.txt` and run the bash script `run.bash`. This script assumes an instance with at least 8 GPUs and runs our method for the base model on the `4 x 2 mesh`. Tested on PyTorch 2.5.1, CUDA 12.6, and Python 3.12.

## Credits
- [AsyncPP](https://github.com/PluralisResearch/AsyncPP)
- [SPARTA](https://github.com/matttreed/diloco-sim)



