# Code repository for "Multi-domain Distribution Learning for De Novo Drug Design"

This repository contains the training and sampling scripts to reproduce the results of our ICLR 2025 submission.

## Environment

Create a conda/mamba environment 
```bash
conda env create -f environment.yaml -n drugflow
conda activate drugflow
```

and add the Gnina executable for docking score computation
```bash
wget https://github.com/gnina/gnina/releases/download/v1.1/gnina -O $CONDA_PREFIX/bin/gnina
chmod +x $CONDA_PREFIX/bin/gnina
```

## Dataset preparation

### Pre-processed dataset
The preprocessed dataset is available on Zenodo
```bash
wget https://zenodo.org/records/13871375/files/processed_crossdocked.zip
unzip processed_crossdocked.zip
```

### (Optional) running pre-processing locally

To process the raw dataset locally, first download and extract the CrossDocked dataset as described by the authors of Pocket2Mol: https://github.com/pengxingang/Pocket2Mol/tree/main/data.

Specify input and output directories
```bash
CROSSDOCKED_DATA=...  # location at which the dataset was extracted
PROCESSED_DATA=...  # location at which the processed dataset will be stored
```

Then, preprocess the data for DrugFlow
```bash
python src/data/process_crossdocked.py $CROSSDOCKED_DATA \
       --outdir $PROCESSED_DATA \
       --flex
```

## Training

Example config files are provided for:
- DrugFlow: `CONFIG=configs/training/drugflow.yml`
- FlexFlow: `CONFIG=configs/training/flexflow.yml`
- Preference alignment: `CONFIG=configs/training/preference_alignment.yml`

Create a symlink to the processed dataset and for the 
```bash
LOGDIR=...  # where checkpoints, and validation outputs will be saved
ln -s $PROCESSED_DATA processed_crossdocked
ln -s $LOGDIR runs
```
Alternatively, you can change the corresponding paths in the config files.


To launch the training job for the DrugFlow base model, for example, run
```bash
python src/train.py --config $CONFIG
```


## Sampling

Pretrained checkpoints can be downloaded from Zenodo with

```bash
# Base DrugFlow model
wget -P checkpoints/ https://zenodo.org/records/13871375/files/drugflow.ckpt

# DrugFlow + confidence head
wget -P checkpoints/ https://zenodo.org/records/13871375/files/drugflow_ood.ckpt

# FlexFlow
wget -P checkpoints/ https://zenodo.org/records/13871375/files/flexflow.ckpt

# DrugFlow after preference alignment
wget -P checkpoints/ https://zenodo.org/records/13871375/files/drugflow_pa_comb.ckpt
```

The selected checkpoint, e.g. `checkpoints/drugflow.ckpt`, must be specified in `configs/sampling/sample_and_maybe_evaluate.yml`.
To sample with your own trained model, simply provide a custom checkpoint path instead.

Furthermore, you need to update the `sample_outdir` parameter in the sampling config file or link the desired output location
```bash
SAMPLE_OUTDIR=...  # where samples will be saved
ln -s $SAMPLE_OUTDIR samples
```

For sampling run
```bash
python src/sample_and_evaluate.py --config configs/sampling/sample_and_maybe_evaluate.yml
```
which supports parallelization across target pockets by specifying `--job_id` and `--n_jobs`.
To also evaluate the results, set `evaluate: True` in the sampling config file.
