
## Introduction

The files in this folder are:

- train_dpo.py: the main file for training (either SFT or DPO preference-based training)
- dataset.py: dataset processing logic for both SFT and DPO preference-based training (for Baichuan series models)
- utils.py: some convenience functions used by multiple other files
- train_dpo.sh: this is where you'll need to make some modification to run the experiment

- data: folder containing the data used for training

## Environment
Our experiments were conducted in the following environment:  
python == 3.9.16
accelerate == 0.27.2  
flash-attn == 2.5.6  
torch == 2.0.1  
transformers == 4.37.2  
deepspeed == 0.13.4  


## Quick start
Run the command ```bash train_dpo.sh```
