To complete IRPM training, you need to finish four parts.

## 1. Prepare the HelpSteer3 data
Please download the HelpSteer3 (https://huggingface.co/datasets/nvidia/HelpSteer3) dataset to the current folder and perform the following preprocessing:

```shell
python gen_train_data.py
python trans_pub.py
```
After the program finishes, it will output a training-data folder under data/ named:
verl_steer3filter_prompt4_x2_score.

## 2. Prepare the model
Please download the model to the current folder.

## 3. Prepare the VERL framework
Our implementation is based on the VERL framework, so you need to set up VERL first:

```shell
git clone https://github.com/volcengine/verl.git
cd verl
git chekcout v0.6.x

# Core reward design
cp ../group_batch_pub_steer4.py ./custom_reward/
# The reward must be computed in batch dimension
cp ../batch2.py ./workers/reward_manager/
cp ../requirements.txt ./
cp -r ../misc ./
cp ../train.sh ./
```

## 4. Start training
Configure the relevant settings in train.sh, then start training:

```shell
# Start training
sh train.sh
```