<div id="top">

<!-- HEADER STYLE: CLASSIC -->

# RuleReasoner: Reinforced Rule-based Reasoning<br>via Domain-aware Dynamic Sampling


## 📍 TL;DR
Reinforced Rule-based Reasoning (RuleReasoner) is a simple yet effective method enabling language models to effectively learn rule-based reasoning. Unlike large models that need complex training, RuleReasoner uses a curated collection of tasks and a domain-aware dynamic sampling approach, adjusting training based on historical performance. This simple yet effective technique allows models to outperform frontier Large Reasoning Models (LRMs) by +4.1% on in-distribution tasks and +10.4% on out-of-distribution tasks, while also being more computationally efficient.
- Domain-aware dynamic sampling with higher training sampling efficiency and domain performance balance.
<img src="assets/training_recipe.jpg" width="80%" style="position: relative; top: 0; right: -0.1cm;" alt="OOD Performance"/>

- Comprehensive Data curation for data curricula on rule-centric application.
<img src="assets/training_data_examples.jpg" width="80%" style="position: relative; top: 0; right: -0.1cm;" alt="OOD Performance"/>

- Rule Reasoner (8B and 4B) depicts comparable performance versus a wide range of baselines.
<img src="assets/id_performance_comparison.jpg" width="80%" style="position: relative; top: 0; right: -0.1cm;" alt="OOD Performance"/>

- Rule Reasoner (8B and 4B) also achives strong OOD performance across three benchmarks (subsets of rule-based reasoning) including BBH, ProverQA, and BBEH.
<img src="assets/ood_performance_comparison.jpg" width="90%" style="position: relative; top: 0.3cm; right: -0.1cm;" alt="OOD Performance"/>


## 🗺️ Table of Contents

- [TL;DR](#-tldr)
- [Table of Contents](#%EF%B8%8F-table-of-contents)
- [Quick Start](#-quick-start)
    - [Prerequisites](#prerequisites)
    - [Installation](#installation)
    - [Training](#training)
    - [Evaluation](#evaluation)
- [Project Structure](#-project-structure)

## 🎯 Quick Start

### Prerequisites

Running `RuleReasoner` requires the following dependencies:

### Installation

Build RuleReasoner from the source and install dependencies:

1. **Navigate to the project directory:**

    ```sh
    ❯ cd RuleReasoner
    ```

2. **Install the dependencies:**

	```bash
	❯ pip install -r requirements.txt
	❯ pip install -e ./verl
	❯ pip install -e .
	```

### Training

Run the training with:

```bash
./scripts/train/train_mix.sh
```

### Evaluation

Run the evaluation with:

```bash
./scripts/eval/eval_model.sh \
    --model $MODEL_PATH \
    --datasets $DATASET_PATH \
    --output-dir $OUTPUT_DIR
```

## 🌳 Project Structure

```bash
└── RuleReasoner
    ├── README.md
    ├── requirements.txt
    ├── scripts
    │   ├── build_dataset.py
    │   ├── data
    │   ├── eval
    │   └── train
    ├── setup.py
    ├── src
    │   ├── __init__.py
    │   ├── data
    │   ├── globals.py
    │   ├── system_prompts.py
    │   └── utils.py
    └── verl
	└── ...
```
