RewardEval lets you quickly evaluate any reward model on any preference set. 
It also will detect if a instruction dataset is passed (by checking for not having `chosen`/`rejected`, and having `messages`) -- for these, just a model outputs are logged (not accuracy).

To run RewardEval, you can run the following command, substituting the model you would like to run and adding any additional model-specific parameters.
```
python scripts/run_rewardeval.py --model={yourmodel} --dataset={your_huggingface_dataset} --do_not_save
```
(For the purposes of anonymizing the submission, we recommend running with --do_not_save since we redact the huggingface directory for uploading results.)