# Directed graph transformers meet metabolic networks

<div align="center">
    <img alt="General overview directed graph transformer" 
         title="Example" src="docs/imgs/dirgt.png" width="500">
</div>

This repository has been forked from [GraphGPS](https://github.com/rampasek/GraphGPS)
and adapted to the needs of the project. The same installation and usage instructions
apply.

The proposed Directed Graph Transformer is composed of three main components:
1. Positional encoding: [LapPE](https://arxiv.org/abs/2106.03893), [MagLapPE](https://arxiv.org/abs/2302.00049).
2. Directed message-passing mechanism: [Dir-GNN](https://arxiv.org/abs/2305.10498).
3. Global attention mechanism: [Transformer](https://arxiv.org/abs/1706.03762). Two variants are implemented:
   - [Structure-Aware Transformer](hhttps://arxiv.org/abs/2202.03036) (SAT).
   - [GraphGPS](https://arxiv.org/abs/2205.12454) (GPS).

### 1. Dataset

The dataset used in this project is a dataset of metabolic networks, where the 
labels correspond with essential reactions in the network. The dataset is
available at [Figshare](https://figshare.com/s/28d3130996b349e05912).

### 2. Installation

This repository, and originally *GraphGPS*, are built over [PyG](https://www.pyg.org/) and [GraphGym from PyG2](https://pytorch-geometric.readthedocs.io/en/2.0.0/notes/graphgym.html),
and require *PyG v2.2*.

```bash
conda create -n dirgt python=3.10
conda activate dirgt

conda install pytorch=1.13 torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
conda install pyg=2.2 -c pyg -c conda-forge
pip install pyg-lib -f https://data.pyg.org/whl/torch-1.13.0+cu117.html

# RDKit is required for OGB-LSC PCQM4Mv2 and datasets derived from it.  
conda install openbabel fsspec rdkit -c conda-forge

pip install pytorch-lightning yacs torchmetrics
pip install performer-pytorch
pip install tensorboardX
pip install ogb
pip install wandb
pip install numba

conda clean --all
```


### 3. Runnnig

All the available configuration files are located in the `configs/GEM` directory.

```bash
conda activate dirgt

# Running GPS with LapPE positional encoding on genome-scale metabolic networks dataset.
python main.py --cfg configs/GEM/lap-GPS-DirGated.yaml  wandb.use False

# Running SAT with MagLapPE positional encoding on genome-scale metabolic networks dataset.
python main.py --cfg configs/SAN/mag-SAT-DirGAT.yaml  wandb.use False

# Running SAT with MagLapPE positional encoding using a particular random seed.
python main.py --cfg configs/SAN/mag-SAT-DirGAT.yaml --repeat 1  seed 42  wandb.use False
```

You can also run multiple instances of the same experiment in parallel with
the Slurm job scheduling system. See the original [GraphGPS](https://github.com/rampasek/GraphGPS)
repository for more details.


#### W&B logging
To use W&B logging, set `wandb.use True` and have a `gtransformers` entity set-up in your W&B account (or change it to whatever else you like by setting `wandb.entity`).



#### Unit tests

To run all unit tests, execute from the project root directory:

```bash
python -m unittest -v
```

Or specify a particular test module, e.g.:

```bash
python -m unittest -v unittests.test_eigvecs
```


## Citation

```bibtex

```
