# Unified Inference Scaling

This is our Unified Inference Scaling system implementation library, including the underlying Test-Time Scaling and the upper-level Model Routing.

We recommend that you first refer to the instructions of the two submodules (e.g. **ttsrouter-v1.1** and **TTSRouter**) in order, and set up the environments and dependencies accordingly.

## Outline

- [Unified Inference Scaling](#unified-inference-scaling)
  - [Outline](#outline)
  - [Directory Structure](#directory-structure)
  - [Getting Started](#getting-started)
    - [Installation](#installation)
    - [ttsrouter-v1.1](#ttsrouter-v11)
      - [Supported Main Tasks and Models](#supported-main-tasks-and-models)
      - [GPU configurations](#gpu-configurations)
      - [Quick verification](#quick-verification)
      - [How to full run](#how-to-full-run)
    - [TTSRouter](#ttsrouter)
      - [Run Evaluation](#run-evaluation)
    - [Data](#data)
  
## Directory Structure

The directory structure is as follows:

```bash
  .
  ├── TTSRouter/    # evaluation and high level call
  |   └── cache/  
  ├── ttsrouter-v1.1/
  │   ├── src/
  │   │   ├── envs/ 
  │   │   ├── metrics/
  │   │   ├── reason/   # reasoning  modules
  │   │   ├── scripts/  # all scripts you need
  │   │   ├── service/  
  │   │   ├── utils.py
  │   │   ├── requirements.txt
  │   │   └── vllmchange.md
  │   └── README.md
  │
  ├── pruned_question_trials_combined.csv  # Initial  data feature tables   
  └── README.md
```

## Getting Started

### Installation

Clone the repository:

```bash
git clone <ANONYMOUS_REPOSITORY_URL>
cd ttsrouter-v1.1/src
```

Create a clean environment:

```bash
conda create -n tts python=3.12
conda activate tts
```

We recommend using .whl to install most dependencies.You can download it at [vllm/releases](https://github.com/vllm-project/vllm/releases/download/v0.11.2/vllm-0.11.2-cp38-abi3-manylinux1_x86_64.whl).

Install the prebuilt wheel first (fast and reliable):

```bash
# example: place the wheel in the ttsrouter-v1.1/src directory
cd src
pip install vllm-0.11.2-cp38-abi3-manylinux1_x86_64.whl
```

Then install the rest of the dependencies:

```bash
pip install --upgrade-strategy=eager -r requirements.txt
```

Install `tmux` for serving policy models and PRMs:

```bash
sudo apt-get update
sudo apt-get install tmux
```

Install `yq` for processing yaml files:

```bash
mkdir -p "$CONDA_PREFIX/bin"
wget -qO "$CONDA_PREFIX/bin/yq" \
  https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64
chmod +x "$CONDA_PREFIX/bin/yq"
```

### ttsrouter-v1.1

You need to cd into the `src` directory and activate the conda environment first.

```bash
conda activate tts
cd ttsrouter-v1.1/src
```

#### Supported Main Tasks and Models

1) Tasks:
   - [MATH-500](https://github.com/openai/prm800k)
   - [AIME24](https://huggingface.co/datasets/AI-MO/aimo-validation-aime)
   - [AIME25](https://huggingface.co/datasets/opencompass/AIME2025)

2) Policy Models Qwen series:
   - [Qwen3](https://huggingface.co/collections/Qwen/qwen3): [0.6B](https://huggingface.co/Qwen/Qwen3-0.6B), [1.7B](https://huggingface.co/Qwen/Qwen3-1.7B), [4B](https://huggingface.co/Qwen/Qwen3-4B), [8B](https://huggingface.co/Qwen/Qwen3-8B), [14B](https://huggingface.co/Qwen/Qwen3-14B), [32B](https://huggingface.co/Qwen/Qwen3-32B)

3) Process Reward Models (PRMs):
   - [Skywork](https://huggingface.co/collections/Skywork/skywork-o1-open-67453df58e12f6c3934738d0): [Skywork-PRM-1.5B](https://huggingface.co/Skywork/Skywork-o1-Open-PRM-Qwen-2.5-1.5B), [Skywork-PRM-7B](https://huggingface.co/Skywork/Skywork-o1-Open-PRM-Qwen-2.5-7B)
   - [Qwen2.5-Math](https://huggingface.co/collections/Qwen/qwen25-math-66eaa240a1b7d5ee65f1da3e): [Qwen2.5-Math-PRM-7B](https://huggingface.co/Qwen/Qwen2.5-Math-PRM-7B)

> **Note:**
> For Skywork-PRM, support for vllm >= 0.6.4.post1 is no longer planned, so we need to modify some parts of the vllm source code to support Skywork-PRM, following the steps below:
>
>First, locate the source code:
>```bash
>python -c "import vllm, inspect, pathlib; print(pathlib.Path(inspect.getfile(vllm)).parent)"
>```
>
>Secondly, apply the patch in `ttsrouter-v1.1/src/vllmchange.md` to the vLLM source code.
>
>For Qwen-PRM, we don't need to modify the vllm source code. However, since the step token of Qwen-PRM is <extro_0> and differs from Skywork's step token("\n"), if you want to use Qwen-PRM, you need to modify the logic in src/reason/vllm_rm_infer_fns.py, or switch to using infer_fns.py.

#### GPU configurations

| Policy Model                  | PRM   | GPU            | CUDA |
|-------------------------------|-------|----------------|------|
| 0.6B, 1.7B, 4B, 8B, 14B       | 1.5B  | 4x RTX5090 32GB| 12.8 |
| 32B                           | 1.5B  | 2x L40 44GB    | 13.0 |
| 0.6B, 1.7B, 4B, 8B, 14B, 32B  | 1.5B  | 4x A100 80GB   | 13.0 |

#### Quick verification

If you only need a quick verification, you can use an RTX 5090 GPU, complete the [Installation](#installation), and then follow these steps.

1) Download the policy models and PRMs (please edit the script/config/5090x4.yaml first to set your model paths):

```bash
conda activate tts
cd src
export VALUE_MODEL_PATH=/your_prm_model_path/skywork-prm-1.5b
export POLICY_MODEL_1_PATH=/your_policy_model_path/policy_models/Qwen3-0.6B
export LOGDIR="${PWD}/logs_vllm"
export HOST_ADDR=0.0.0.0
export CONTROLLER_PORT=21001
export WORKER_BASE_PORT=10081
bash scripts/serve_models.sh scripts/config/5090x4.yaml
```

2) Start reference (at the same terminal):

```bash
bash scripts/run_all_variants.sh 
```

3) Data collection (after reference):

   Just follow our script [ttsrouter-v1.1/src/scripts/sample_qp_per_question_pruned.py](/ttsrouter-v1.1/src/scripts/sample_qp_per_question_pruned.py) to collect all data we need.

#### How to full run

You can also use all the features of the code.

Please refer to the detailed instructions in [ttsrouter-v1.1/README.md](/ttsrouter-v1.1/README.md) for a complete guide on running the full system with all features enabled.

### TTSRouter

You need to cd into the `TTSRouter` directory and activate the conda environment first.

```bash
# After returning to the project root directory
conda activate tts
cd TTSRouter
```

#### Run Evaluation

Following is an example to run evaluation:

```bash
python run_evaluation.py 
# Note that the feedback "answer missing" here is correct
```

### Data

Our experiments use total 100K samples, including 70K samples from MATH-500, 30K samples from AIME24 and AIME25.

Given that the initial data is very large (about 50GB), we only include the tables of data features used in subsequent experiments here. The detailed experimental data will be open-sourced after the paper is accepted.

You can find the data feature tables in the [pruned_question_trials_combined.csv](pruned_question_trials_combined.csv).
