# Fragment-Wise Interpretability in Graph Neural Networks via Molecule Decomposition and Contribution Analysis


The **SEAL** project aims to create an interpretable GNN model whose predictions are based on the contribution of defined chemical fragments. This will increase user (chemists, biologists) trust in the model's predictions and facilitate the identification of potential drug candidates.


We develop a Novel GNN Architecture named: SEALCONV: a new graph neural network architecture that limits the spread of information between different fragments. This will enhance the interpretability of a given subgraph by reducing the influence of surrounding fragments

Fragment Molecule Graphs: Select the best method for dividing molecules into fragments. We will explore existing techniques such as: 

**BRICS** (Breaking of Retrosynthetically Interesting Chemical Substructures): This method defines different chemical environments represented by special atoms that connect fragments and indicate the type of chemical environment at a given cleavage site. This ensures that the obtained fragments retain their functional meaning within a defined chemical context.


🛠️ Setup

```bash
python -m venv venv 
source venv/bin/activate
pip install -r requirements.txt
```

🗃️ Data

The dataset used in this project is from TDC (https://tdcommons.ai/). It contains chemical compounds and their properties, which will be used to train and evaluate the GNN model. The dataset is divided into training, validation, and test sets.
The data is stored in the `data` directory. You can specify the type of dataset using
```bash
parser.add_argument('--data-set', default='rings-count', choices=['sol', 'cyp', 'herg_k'],
                        type=str, help='dataset type')
parser.add_argument('--task', default='classification',
                        type=str, choices=['regression', 'classification'])
parser.add_argument('--num-classes', type=int,
                        default=1, help='Number of classes')
```

Synthetic datasets are from this repo, guidance how to download them please, see: https://github.com/mproszewska/B-XAIC
You have to download them and put them in the `data/` directory.
- `data/data.csv`
- `data/explanations.sdf`
You can specify the type of task using
```bash
parser.add_argument('--data-set', default='rings-count', choices=['rings-count', 'rings-max','X','P','B','indole','PAINS'],
                        type=str, help='dataset type')

parser.add_argument('--task', default='classification',
                        type=str, choices=['classification'])
parser.add_argument('--num-classes', type=int,
                        default=1, help='Number of classes')

```

Specify the split of the dataset using
```bash
parser.add_argument("--split", default=0, type=int, help="Split index for dataset")
```

🏋️‍♂️ Training
```bash
python train.py \
     --batch-size $BATCH \
     --device cuda \
     --epochs $EPOCH \
     --num-layers $LAYERS \
     --model SEAL \
     --weight-decay $WEIGHT_DECAY \
     --hidden-dim $HIDDEN \
     --regularize $REGULARIZE \
     --drop $DROPOUT \
     --lr $LR \
     --task $TASK \
     --data-set $DATASET 

```

🏁 Evaluation
```bash
python experiments/scripts_Seal/extract_explanations.py --explainer_type SEAL --explainer_path ./default_cyp.pth --save_path ./default_cyp_explanations.pth

python experiments/scripts_Seal/evaluate_explanations.py --explanations_path ./default_cyp_explanations.pth --save_path ./default_cyp_evaluation.csv
```
🏁 Synthetic datasets evaluation
```bash

python experiments/scripts_Seal/extract_explanations_synthetic.py --explainer_type SEAL --explainer_path ./default_rings-max.pth --save_path ./default_rings-max_explanations.pth


python experiments/scripts_Seal/evaluate_explanations_synthetic.py --explanations_path ./default_rings-max_explanations.pth --save_path ./default_rings-max_evaluation.csv
```

🏁 Evaluation others
```bash
python experiments/scripts/train_model.py --data-set cyp --task classification --epochs 1 --save_path ./default_bxaic_cyp.pth

python experiments/scripts/extract_explanations.py --explainer_type Saliency --model_path ./default_bxaic_cyp.pth --save_path ./default_bxaic_cyp_Saliency_explanations.pth

python experiments/scripts/evaluate_explanations.py --explanations_path ./default_bxaic_cyp_Saliency_explanations.pth --save_path ./default_bxaic_cyp_Saliency_eval.csv

```
🏁 Synthetic datasets evaluation others
```bash
python experiments/scripts/train_model_synthetic.py --data-set rings-max --task classification --epochs 1 --save_path ./default_bxaic_rings-max.pth

python experiments/scripts/extract_explanations_synthetic.py --explainer_type Deconvolution --model_path ./default_bxaic_rings-max.pth --save_path ./default_bxaic_rings-max_Deconvolution_explanations.pth

python experiments/scripts/evaluate_explanations_synthetic.py --explanations_path ./default_bxaic_rings-max_Deconvolution_explanations.pth --save_path ./default_bxaic_rings-max_Deconvolution_eval.csv

```

For evaluation HiGNN please see:
https://github.com/idruglab/hignn

For evalutaion ProtGNN please see:
https://github.com/zaixizhang/ProtGNN

# Note: The above commands are examples, you can adjust the parameters according to your needs.
