```
RLHF for PPO, AWR and APA.

# Installation:

pip install -e 

# launch training
accelerate launch --config_file configs/accelerate/zero2-bf16.yaml examples/script/ppo_hh.py

Replace ppo_hh with sppo_hh, ppo_tldr, sppo_tldr to run on other methods / datasets. In sppo, we have the choice of square loss for APA and log loss for AWR.
```