## Code for InstructPLM-DPO
The code includes **three** major parts:
- [Model definition](opensource/InstructPLM) 
- [Trainer](opensource/dpo_trainer.py)  
- [Dataset](opensource/construct_dataset.py)

### Model definition
We mainly modify the tokenization file from the original [InstructPLM tokenizer](https://github.com/Eikor/InstructPLM) in order to support DPO training.

### Trainer
We add few modifications from the [TRL DPO trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer) to support the chosen-sample-only training (SFT training in paper)

### Dataset
We provide the dataset construction script and the [annotation file](opensource/data/cath_train_temp1_100.pkl) (predicted sequences and TM-Scores) of CATH 4.2 train set used in our experiments.

## Model Training
We provide the training script used in our experiments for model training.

To train the InstructPLM-DPO model, you should: 
1. Install python environment with `requirements.txt`
2. Prepare preprocessed structure embedding file using [preprocess.py](https://github.com/Eikor/InstructPLM/blob/main/preprocess.py) and store them in ` --structure_emb_path`.
3. Construct dataset by 
` python construct_dataset.py `.
4. Modify the structure emb path and the dataset and run training script: `./train.sh`.

