# Official Implementation of NIPS-2025 Submission: Simplify RLHF as Reward-Weighted SFT: A Variational Method

## Training

To train the model, run the following command:

```
bash train.sh
```

## Evaluation during training

Develop evaluation reward model via API call:

```
bash develop_reward_api_oasst.sh
```
