### Install SGLang
Before running the code, the SGLang framework—which enables efficient inference—must be installed.
The framework can be installed via the following link: [SGLang Installation](https://docs.sglang.ai/start/install.html)

### Launch Models
Subsequently, we must launch models using SGLang framework:
```
bash launch_model.sh [GPU]
```

### Inference
First, we generate responses for each problem from the desired Large Reasoning Model using the following command:
```
python inference.py --model_name [MODEL_NAME] --model_path [MODEL_PATH]
```

### Estimating correctness probability
Each response is segmented into thinking pattern units based on linguistic cues (e.g., “Wait”) that indicate transitions between reasoning patterns. For each segment index, we employ Monte Carlo estimation by sampling multiple completions to estimate the probability of correctness.
```
python rollout.py
```

### Conclude the reasoning trajectory
Once the threshold is reached, we coherently conclude the reasoning process and generate completions conditioned on the finalized trajectory.
```
python conclude.py
```

### Applying a pruning function with auxiliary models
We evaluate the utility of each intermediate thinking pattern using an auxiliary model. To further ensure that the pruned sequence still leads to the correct answer, we perform a lightweight decoding step for validation. Patterns identified as redundant and pass validation step are removed through the pruning function.
To this end, we should launch the auxiliary model first:
```
bash launch_aux.sh [GPU]
```
After that, run the following command:
```
python prune.py
```

### Construct pairwise dataset
Through the above process, we construct a pairwise dataset consisting of optimized and suboptimal trajectories that differ in their underlying thinking dynamics. This dataset can be used to apply Preference Optimization.
```
python construct_dataset.py
``