<div align="center">

## Setup


### Train Enviroment
The installation commands that we verified as viable are as follows:
```bash
conda create -y -n rlvr_train python=3.10
conda activate rlvr_train
pip install -e .
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121
pip install ray vllm==0.6.3
pip install flash-attn --no-build-isolation
pip install wandb matplotlib
pip install huggingface_hub
```
### Eval Enviroment
The installation commands that we verified as viable are as follows:
```bash
conda create -y -n rlvr_eval python=3.10
conda activate rlvr_eval
cd Qwen2.5-Eval/evaluation
cd latex2sympy
pip install -e .
cd ..
pip install -r requirements.txt 
pip install vllm==0.5.1 --no-build-isolation
pip install transformers==4.42.3
pip install wandb matplotlib
pip install -U transformers
pip install vllm==0.6.3
```



## Training
Before training, we can assign the checkpoint path:
```bash
export CHECKPOINTS_DIR=./checkpoints/ # your checkpoint path
```

To run 1-shot RLVR with $\pi_1$, we can run:
```bash
conda activate rlvr_train
bash scripts/train/training_1.5b_pi1_r128.sh
```

As a comparison, the commands for running full-set RLVR on DSR-sub is as below:
```bash
conda activate rlvr_train
bash scripts/train/training_1.5b_dsr_sub.sh 
```

Please change `data.train_files` and `trainer.experiment_name` in the training script when trying other training examples.

## Evaluation

### Eval Scripts
To run evaluation for 1-shot RLVR with $\pi_1$ on 6 common math reasoning benchmarks (MATH500, AIME24, AMC23, Minerva Math, OlympiadBench, AIME25), we can follow the commands:
```bash
conda activate rlvr_eval
cd Qwen2.5-Eval/evaluation
bash sh/eval_one_experiment_all_ckpts.sh
```
Here for AIME24, AMC23, and AIME25, we evaluate the pass@8 results.
Please adjust the experiment name in `Qwen2.5-Eval/evaluation/sh/eval_one_experiment_all_ckpts.sh` when using other training examples. 


