
# Graph Diffusion Experiments – Reproducibility Package

This folder now contains **ten** Jupyter notebooks that repeat every result in the paper plus a new large‑scale test.  
Each notebook runs in **Google Colab** with a single **T4 GPU**.

> **Quick start**  
> 1. Open any notebook in Colab.  
> 2. *Runtime ▷ Change runtime type* → choose **GPU**.  
> 3. Run every cell from top to bottom.

---

## 1. Notebook list

| Notebook | Purpose | Main outputs |
|----------|---------|--------------|
| `DiGress_QM9.ipynb` | DiGress on QM9 | Accuracy, validity, novelty |
| `GDSS_QM9.ipynb` | GDSS on QM9 | Same metrics for GDSS |
| `EXP1_EFPC.ipynb` | Check Edge‑Flip Posterior Concentration | Confidence curves |
| `EXP2_ETDB.ipynb` | Check Edge‑Target Deviation Bound | Target‑error curves |
| `EXP3_MDEP.ipynb` | Check Mean Deviation of Edge Posterior | Error bars |
| `EXP4_Couped.ipynb` | Coupled structure–feature study | Joint error plots |
| `Coupled_factor_γ_Analysis.ipynb` | Ablation on coupling factor γ | Heat maps |
| `soc_Digress.ipynb` | Second‑order DiGress | Checkpoints, summary |
| `soc_GDSS.ipynb` | Second‑order GDSS | Checkpoints, summary |
| `Soc_large_scale_GDSS.ipynb` | **NEW** large‑scale GDSS (≈50 k nodes) | Runtime and memory figures |

Each notebook builds a local `results/` folder with figures, logs, and checkpoints.  
The last cell zips that folder so you can download everything with one click.

---

## 2. Environment

The first cell in each notebook installs all packages:

```bash
pip install -r requirements.txt
```

Main versions:

* PyTorch 2.3  
* PyTorch Geometric 2.5  
* RDKit 2023.09  
* NetworkX 3.2  
* NumPy 1.26  
* Matplotlib 3.9  

For local runs, use a CUDA GPU with at least **12 GB** and set `DEVICE="cuda"`.

---

## 3. Suggested order

1. `DiGress_QM9.ipynb`  
2. `GDSS_QM9.ipynb`  
3. `EXP1_EFPC.ipynb`  
4. `EXP2_ETDB.ipynb`  
5. `EXP3_MDEP.ipynb`  
6. `EXP4_Couped.ipynb`  
7. `Coupled_factor_γ_Analysis.ipynb`  
8. `soc_Digress.ipynb`  
9. `soc_GDSS.ipynb`  
10. `Soc_large_scale_GDSS.ipynb`

---

## 4. Tips

* **Seeds**: set in the first cell; change `SEED` to see variance.  
* **Memory**: lower `BATCH_SIZE` if you meet OOM (the large‑scale test needs most memory).  
* **Time**: training a baseline on a T4 takes 4–5 h; the large‑scale test needs about 7 h; evaluation notebooks finish in 30 min.

---

## 5. Troubleshooting

| Message | Cause | Fix |
|---------|-------|-----|
| `CUDA out of memory` | GPU full | Lower batch size or turn off AMP |
| `ImportError: torch_geometric` | Wheel version mismatch | Rerun the install cell |
| RDKit warnings | Verbose parser | Safe to ignore or silence logger |

---

## 6. Licence and citation

Released under Apache 2.0.  Please cite:

```
@inproceedings{Anonymous2025NoiseFreeGDM,
  title     = {Is Noise Conditioning Necessary? A Unified Theory of Unconditional Graph Diffusion Models},
  author    = {Anonymous},
  booktitle = {Proc. Neural Information Processing Systems},
  year      = {2025}
}
```

*Open an issue after the review phase if you have questions.*
