## Code Implmentation for RL4SBDD-M
This is the code implementation for the paper "RL4SBDD: Reinforcement Learning for Preference Alignment in Structure-based Drug Design".

### Installation
The code is developed based on the open-sourced project [OpenBioMed](https://github.com/PharMolix/OpenBioMed). Please follow the installation instructions in the OpenBioMed repository to set up the environment.

### Data Preparation
The processed CrossDocked2020 dataset can be downloaded [here](https://figshare.com/articles/dataset/crossdocked_pocket10_with_protein_tar_gz/25878871). Unzip the file and place the `crossdocked_pocket10_with_protein` in the `data` folder. Download the `split_by_name.pt` [here](https://drive.google.com/drive/folders/1CzwxmTpjbrt83z_wBzcQncq84OVDPurM) and 'test_set.zip' [here](https://drive.google.com/drive/folders/1j21cc7-97TedKh_El5E34yI8o5ckI7eK) and place them in the `data` folder.

### Sampling and Evaluation
We provide the model checkpoints after the 2nd round of iterative RL under the `checkpoints` folder. To run the sampling and evaluation, you can use the following commands:

```bash
bash scripts/sample/rl4sbdd-m.sh # Sampling with 10 molecules per pocket
bash scripts/sample/rl4sbdd-m_repeated.sh # Sampling with 100 samples per pocket
```

### Sampling Results
We provide the sampling and evaluation results of RL4SBDD-M under the `data/sample_results` folder.

### Training
For the initialization of the policy and value models, you should first process the training set with the following command:
```bash
python data_preparation/process_csd.py
```
You should set the `train_cutoff` and `val_cutoff` in the `configs/train/molcraft_cfg.yaml` and `configs/train/critique_sbdd.yaml` to 99800 and 99900, respectively. Then, run the following commands:
```bash
bash scripts/train/train_critique_sbdd.sh
bash scripts/train/train_molcraft_cfg.sh
```
The checkpoints of the policy and value models will be saved in the `lightning_logs` folder.

For the iterative RL, we provide the following script for parallel sampling on the training set:
```bash
bash scripts/run_train_sampling_dist.sh
```
For each iteration, you will need to set the `model_ckpt_path` in the command arguments within the bash script to the checkpoints of the policy model after the previous iteration, and set the `value_model_ckpt` in `configs/sample/Mixed_CG_CFG.yaml` to the checkpoints of the value model after the previous iteration.

Then, you need to run the following commands to perform filtering, post-processing, and compute the reward for the sampled molecules:
```bash
python data_preparation/merge_filter_data.py
python data_preparation/compute_train_reward.py
python data_preparation/postprocess_train.py
```
The final program will give the train cutoff and val cutoff, and you should set the `train_cutoff` and `val_cutoff` in the `configs/train/molcraft_cfg.yaml` and `configs/train/critique_sbdd.yaml` for the next iteration of training.

NOTE: The experiments are conducted on NVIDIA A800 GPUs with 80GB memory. If you encounter out of memory errors, you can reduce the batch size in the configs.