# FragFormer: A Fragment-Based Representation Learning Framework for Molecular Property Prediction


![fail to find image](figures/pipeline.png)

*This is partial code of FragFormer. The full code will be released soon.* \
Prepare the environment: 
```bash
pip install -r requirements.txt
```

### Pre-training Stage 

We provide the learned vocubulary by principle subgraph mining in `src/data/ps/chembl29_vocab_order1_500.txt`   \
\
First download the chembl29 dataset. Then, preprosss the dataset (fragmentatoon) and save it to local dir: 
```bash 
python preprocess.py
```

Run pre-training: 
```bash
cd scripts 
```
```bash 
CUDA_VISIBLE_DEVICES=0,1 python -u -m torch.distributed.run --nproc_per_node=2 --nnodes=1 --master_port 12102 train_fragformer.py --save_path ../models/pretrained/base_0.3 --n_threads 8 --n_devices 2 --n_steps 25000 --mask_rate 0.3 --d_model 512 --n_mol_layers 6 --attn_drop 0.1 --feat_drop 0.1 --batch_size 4096 --knodes md ecfp torsion maccs --vocab_size 500  --order 1 
```


### Fine-tuning Stage

First download the PharmaBench dataset. Then, preprosss the dataset (fragmentatoon) and save it to local dir: 
```bash
python preprocess_downstream.py 
```

Run fine-tuning: 
```bash
cd scripts 
```
```bash
CUDA_VISIBLE_DEVICES=0 python finetune_fragformer.py --model_path ../models/pretrained/base_0.3/base_50_25000.pth --dataset pharm_ames --weight_decay 0 --dropout 0 --lr 3e-5 --warmup --d_model 512 --n_mol_layers 6 --epochs 50 --warmup_epochs 5 --knodes md ecfp torsion maccs  --vocab_size 500  --order 1 
```

### Explainbility Analysis

```bash
cd scripts 
```

```bash
CUDA_VISIBLE_DEVICES=1 python explain_fragformer.py --model_path ../models/finetune/mutag.pt --dataset mutag --weight_decay 0 --dropout 0 --lr 3e-5 --warmup --d_model 512 --n_mol_layers 6 --warmup_epochs 5 --order 1 --vocab_size 500 
```