This codebase implements the model described in our paper. Due to the large size of our dataset (151 GB for the single-chain polymer dataset, 884 GB for the Li-ion battery dataset), we are unable to submit them here. However, we include all source code for peer-review. The procedure described here runs a complete experiment from training to simulation and evaluation.

### Steps to run this code:

1. create a conda environment:

```
conda create -n graphwm python=3.7
```

2. install the specified PyTorch and torch_geometric versions (you need to have cuda 10.2 installed):
   
```
pip install torch==1.9.0

pip install --user --no-cache-dir torch-scatter -f https://pytorch-geometric.com/whl/torch-1.9.0+cu102.html
pip install --user --no-cache-dir torch-sparse -f https://pytorch-geometric.com/whl/torch-1.9.0+cu102.html
pip install --user --no-cache-dir torch-cluster -f https://pytorch-geometric.com/whl/torch-1.9.0+cu102.html
pip install --user --no-cache-dir torch-spline-conv -f https://pytorch-geometric.com/whl/torch-1.9.0+cu102.html
pip install --user --no-cache-dir torch-geometric
```

3. install other dependencies:
   
```
pip install -r requirements.txt
```

4. install our code as a package:

```
pip install -e ./
```

5. Go to the `graphwm` folder training on the toy-sized data (training config used is in `conf/train.yaml`). For convenience we used relative directory in this code submission. We recommend using absolute directories everywhere. We use `wandb` for logging. An account and logging in are needed to use the logging functionalities. The .env files should also be modified.

```
python train.py
```

6. evaluate the model (evaluation config used is in `conf/eval.yaml`). Performance is maximized when using a large batch size to parallelize simulation of many systems on a single GPU. 

```
python eval_rg.py
```
