# README: TrigReason

This repository contains the official implementation of **TrigReason: Trigger-Based Collaboration between Small and Large Reasoning Models**, a novel framework that enables efficient and effective collaboration between small and large language models during reasoning tasks through trigger-based mechanisms.



## 1. Setup

### Create Conda Environment

```bash
conda create -n Trigreason python=3.12 -y
conda activate Trigreason
```

> Make sure you have SGLang installed and configured properly. You can install it via pip:
> ```bash
> pip install sglang
> ```

---

## 2. Launch SGLang Servers

Start the SGLang inference servers for both the large and small models.

### Launch QwQ-32B (Large Model)

```bash
python3 -m sglang.launch_server \
    --model-path Qwen/QwQ-32B \
    --port 30001 \
    --mem-fraction-static 0.8 \
    --tp 4 \
    --host 0.0.0.0
```

### Launch DeepSeek-R1-Distill-Qwen-1.5B (Small Model)

```bash
python3 -m sglang.launch_server \
    --model-path deepseek-ai/DeepSeek-R1-Distill-Qwen-1___5B \
    --port 30002 \
    --mem-fraction-static 0.6 \
    --tp 4 \
    --host 0.0.0.0
```

> 💡 **Note**: Replace model paths with local paths if necessary. Ensure your GPU has sufficient memory (e.g., 80GB+ for QwQ-32B with tensor parallelism).

### Update Model Endpoints in Code

In `trig_reason.py`, modify the `model_names` dictionary to reflect your server IPs and ports:

```python
model_names = {
    "QwQ-32B": "your_large_model_ip:30001",           # e.g., localhost:30001
    "DeepSeek-R1-Distill-Qwen-1___5B": "your_small_model_ip:30002",  # e.g., localhost:30002
    "Qwen3-A3B": "localhost:30000",
    "Qwen3-0___6B": "localhost:30001",
    "deepseek-reasoner": "https://api.deepseek.com"
}
```

If running locally, `localhost` is sufficient. For remote servers, use the appropriate IP address.

---

## 3. Run TrigReason

Execute the reasoning pipeline with the following command:

```bash
python trig_reason.py \
    --dataset_name aime24 \
    --problem_id 60 \
    --repeat_id 0 \
    --start_step 20 \
    --rectify_step 1 \
    --token_budget 8192 \
    --confident_threshold 0.85 \
    --big_model_name QwQ-32B \
    --small_model_name DeepSeek-R1-Distill-Qwen-1___5B
```

### Parameter Explanation

| Parameter | Description |
|---------|-------------|
| `--dataset_name` | Dataset to evaluate on. Options: `aime24`, `gpqa`, or `aime25`. Default: `aime24`. |
| `--problem_id` | Index of the problem in the dataset to solve. For `aime24`, valid range is 60–89. |
| `--repeat_id` | Identifier for repeated runs (useful for statistical evaluation). Range typically 0–15. |
| `--start_step` | Number of initial steps where the **large model** is used before switching to the small model. Enables strategic priming. |
| `--rectify_step` | Number of steps to **fall back to the large model** when hesitation patterns are detected in consecutive reasoning steps. |
| `--token_budget` | Maximum total number of output tokens allowed during reasoning. Prevents infinite loops. Default: 8192. |
| `--confident_threshold` | Threshold for **cognitive offload trigger**. If the ratio of tokens with perplexity < 1.05 exceeds this value, the step is considered overconfident and the large model takes over.|
| `--big_model_name` | Name of the large reasoning model (must match key in `model_names`). |
| `--small_model_name` | Name of the small reasoning model (must match key in `model_names`). |






