# Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning

## Environment

```
conda create -n verl python==3.10
conda activate verl
pip install -e .
pip install flash-attn --no-build-isolation
```

## Dataset

We train our model on the Math dataste and evaluate on four test sets: MATH500 (a 500-problem subset of the MATH test set, serving as the in-distribution benchmark, GSM8K and AIME2024 (both out-of-distribution), and TheoremQA (targeting symbolic STEM reasoning).

```
# download dataset
python data_process/math_dataset.py --local_dir '/path/to/local/directory'
python data_process/math500.py --local_dir "/path/to/local/directory"
python data_process/gsm8k.py --local_dir "/path/to/local/directory"
python data_process/theoremqa.py --local_dir "/path/to/local/directory"
python data_process/aime.py --local_dir "/path/to/local/directory"
```

## Training from sratch

- start a compressor server
```
python compressor_server.py
```

- run a bash file

```
sh train/run_bingo_all.sh
```