## Proxy Model Training

To reproduce the reward models reported in the paper, use `launch.sh` to finetune the base proxy models of different sizes. To train proxies with SAM, use `--sam 1` in `tune_accelerate.sh`.