# Eurus

This is an example of using Eurus-RM ([Yuan et al., 2024](https://arxiv.org/html/2404.02078v1)) to perform best-of-N sampling with Llama-3 8B as the base model.


## Introduction

Eurus-RM-7B is trained on a mixture of UltraInteract, UltraFeedback, and UltraSafety, with a specifically designed reward modeling objective for reasoning to directly increase.

EURUS-RM-7B stands out as the best 7B RM overall and achieves similar or better performance than much larger baselines. Particularly, it outperforms GPT-4 in certain tasks.

## Running the example

Prerequisites:
- Download LLama-3 8B model.
- Have 2 * 24 GB GPUs.

Script:
```bash
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.run --nproc_per_node 1 examples/Eurus/inference.py --model_dir $LLAMA3_CKPTS --best_of_n 10
```

## Results

We tested the performance of using Eurus-RM-7B to select the best of 10 reasoning chains generated by Llama-3 8B on GSM8k.

|Method|Accuracy|
|-|-|
|CoT (Llama-8B)|0.487|
|CoT (Llama-8B) +Best-of-10 (Eurus-RM-7B) | 0.726|
 

## Reference
```bibtex
@article{yuan2024advancing,
  title={Advancing LLM Reasoning Generalists with Preference Trees},
  author={Yuan, Lifan and Cui, Ganqu and Wang, Hanbin and Ding, Ning and Wang, Xingyao and Deng, Jia and Shan, Boji and Chen, Huimin and Xie, Ruobing and Lin, Yankai and others},
  journal={arXiv preprint arXiv:2404.02078},
  year={2024}
}
```
