
<p align="center">
    <img src="docs/cbg_logo.png" width="400" class="center" alt="CBGBench Logo"/>
    <br/>
</p>


*CBGBench: Complex Binding Graph Benchmark* is a benchmark for generative target-aware molecule design.


## Introduction

This is the official code repository of the paper 'CBGBench: Fill in the Blank of Protein-Molecule Binding Graph', which aims to unify target-aware molecule design with single code implementation. Until now, we have included 7 methods as shown below:
| Model | Paper link  |
|------------|---------------------------------------------|
| Pocket2Mol | https://arxiv.org/abs/2205.07249  |
| GraphBP | https://arxiv.org/abs/2204.09410 | 
| DiffSBDD | https://arxiv.org/abs/2210.13695 |
| DiffBP | https://arxiv.org/abs/2211.11214 | 
| TargetDiff | https://arxiv.org/abs/2303.03543 |
| FLAG | https://openreview.net/forum?id=Rq13idF0F73 | 
| D3FG | https://arxiv.org/abs/2306.13769 | 

These models are initially established for `de novo` molecule generation, and we extend more tasks including `linker design`, `fragment growing`, `scaffold hopping`, and `side chain decoration`.
<p align="center">
    <img src="scripts/tasks.jpg" width="600" class="center" alt="Extend Tasks"/>
    <br/>
</p>

## Installation

#### Create environment with basic packages.

```
conda env create -f environment.yml
conda activate cbgbench
```

#### Install pytorch and torch_geometric

```
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia
conda install pyg pytorch-scatter pytorch-cluster -c pyg
```

#### Install tools for chemistry

```
# install rdkit, efgs, obabel, etc.
pip install --use-pep517 EFGs
pip install biopython
pip install lxml
conda install rdkit openbabel tensorboard tqdm pyyaml easydict python-lmdb -c conda-forge

# install plip
mkdir tools
cd tools
git clone https://github.com/pharmai/plip.git
cd plip
python setup.py install
alias plip='python plip/plip/plipcmd.py'
cd ..

### Note that if there is an error in setup.py, it can be ignored as long as openbabel is installed.

# install docking tools
conda install -c conda-forge numpy swig boost-cpp sphinx sphinx_rtd_theme
python -m pip install git+https://github.com/Valdes-Tresanco-MS/AutoDockTools_py3
pip install meeko==0.1.dev3 scipy pdb2pqr vina

# If you are unable to install vina, you can try: conda install vina
# If you encounter the following error:
# ImportError: libtiff.so.5: cannot open shared object file: No such file or directory
# Please try the following steps to resolve it:
pip uninstall pillow
pip install pillow
```

## Prepare Dataset

### CrossDocked2020

#### Download raw data and prepare from scratch

(i) Download `crossdocked_v1.1_rmsd1.0.tar.gz` from [TargetDiff Drive](https://drive.google.com/drive/folders/1j21cc7-97TedKh_El5E34yI8o5ckI7eK), and copy it to `./raw_data/` with

```
mkdir raw_data
tar -xzvf crossdocked_v1.1_rmsd1.0.tar.gz ./raw_data
```

(ii) Run the following:

```
python ./scripts/extract_pockets.py --source raw_data/crossdocked_v1.1_rmsd1.0 --dest raw_data/crossdocked_v1.1_rmsd1.0_pocket10
```

(iii) In training, the Dataset will be prepared for each task. Please refer to `Training`.


## Training

#### If you want to train from scratch, you can run the following

```
python train.py --config ./configs/{task}/train/{method}.yml --logdir ./logs/{task}/{method}
```

{task} can be replaced with `denovo`, `linker`, `frag`, `scaffold` and `sidechain`, and {method} can be replaced with the model name. The following table gives the detailed method-task pairs and the replacement.

Table: method-task pairs used to train from scratch.
| Method | Task | {method} + {task} |
|------------|-----------------------|-----------------------------------------|
| Pocket2Mol | de novo | pocket2mol + denovo |
| GraphBP | de novo | graphbp + denovo |
| DiffSBDD | de novo | diffsbdd + denovo |
| DiffBP | de novo | diffbp + denovo |
| TargetDiff | de novo | targetdiff + denovo |
| FLAG | de novo | flag + denovo |
| D3FG | de novo | d3fg_fg + denovo ; d3fg_linker + denovo |
| Pocket2Mol | linker design | pocket2mol + linker |
| GraphBP | linker design | graphbp + linker |
| DiffSBDD | linker design | diffsbdd + linker |
| DiffBP | linker design | diffbp + linker |
| TargetDiff | linker design | targetdiff + linker |
| Pocket2Mol | fragment growing | pocket2mol + frag |
| GraphBP | fragment growing | graphbp + frag |
| DiffSBDD | fragment growing | diffsbdd + frag |
| DiffBP | fragment growing | diffbp + frag |
| TargetDiff | fragment growing | targetdiff + frag |
| Pocket2Mol | scaffold hopping | pocket2mol + scaffold |
| GraphBP | scaffold hopping | graphbp + scaffold |
| DiffSBDD | scaffold hopping | diffsbdd + scaffold |
| DiffBP | scaffold hopping | diffbp + scaffold |
| TargetDiff | scaffold hopping | targetdiff + scaffold |
| Pocket2Mol | side chain decoration | pocket2mol + sidechain |
| GraphBP | side chain decoration | graphbp + sidechain |
| DiffSBDD | side chain decoration | diffsbdd + sidechain |
| DiffBP | side chain decoration | diffbp + sidechain |
| TargetDiff | side chain decoration | targetdiff + sidechain |

Note that D3FG and FLAG are not compatible with the extended tasks, and D3FG utilizes 2-stage-training strategies, so if you want to train D3FG yourself, you need to run:

```
python train.py --config ./configs/denovo/train/d3fg_fg.yml --logdir ./logs/denovo/d3fg_fg
python train.py --config ./configs/denovo/train/d3fg_linker.yml --logdir ./logs/denovo/d3fg_linker
```

## Generation on test sets

Once the model is trained, you can draw samples from them on the test pockets, with the following:

```
bash generate.sh --method {method} --task {task} --tag {tag} --checkpoint {ckpt_number}
```

In the command, {method} and {task} pair can be found in the former Table on method-task pairs. {tag} should be replaced with `selftrain` or `pretrain`, according to the checkpoints you use.
If the --checkpoint parameter is provided without a number, it will automatically find the latest `.pt` file. If a number is provided, it will use the specified checkpoint file.

For example, if you want to generate samples on the test set for _de novo_ design with _self-trained_ _targetdiff_ model, you can run:

```
bash generate.sh --method targetdiff --task denovo --tag selftrain --checkpoint
```

Or if you want to test the checkpoint of the 100000-th.

```
bash generate.sh --method targetdiff --task denovo --tag selftrain --checkpoint 100000
```

Note that `D3FG` uses a two-step generation strategy, so the command should be

```
bash generate.sh --method d3fg_fg --task denovo --tag {tag} --checkpoint  # generate the functional groups
bash generate.sh --method d3fg_linker --task denovo --tag {tag} --checkpoint # generate the linkers
```

## Evaluation

Run the following:

```
cd evaluate_scripts
bash evaluate.sh --method {method} --task {task} --tag {tag}
```

For example, if you have trained TargetDiff by yourself, run

```
bash evaluate.sh --method targetdiff --task denovo --tag selftrain
bash evaluate.sh --method targetdiff --task frag --tag selftrain
bash evaluate.sh --method targetdiff --task linker --tag selftrain
bash evaluate.sh --method targetdiff --task scaffold --tag selftrain
bash evaluate.sh --method targetdiff --task sidechain --tag selftrain
```

Or if you have downloaded pretrained models, please run

```
cd evaluate_scripts
bash bash evaluate.sh --method {method} --task {task} --tag pretrained
```
