# Supplementary Materials

This package contains the datasets, training configurations, scripts, and prompts used to reproduce the results in our submission.

## 1. Datasets (`dataset/`)

Due to file size constraints, we provide the training data used for our 1.5B model experiments in `.parquet` format.


## 2. Training Implementation

Our training pipeline is built upon the open-source **DFT** framework.

* **Base Repository:** [https://github.com/yongliang-wu/DFT](https://github.com/yongliang-wu/DFT)
* **Environment Setup:** Please refer to the `README.md` in the base repository for installation instructions.

### Provided Training Scripts (`scripts/`)
We provide the specific training scripts and configurations used in our paper:

* `scripts/run_1.5b.sh`: Configuration for the 1.5B model.
* `scripts/run_7b.sh`: Configuration for the 7B model.

> **⚠️ Note on 7B Model Training:**
> Due to VRAM constraints, we set `data.micro_batch_size_per_gpu=2` for the 7B model experiments. This configuration is explicitly reflected in `scripts/run_7b.sh`.

## 3. Evaluation

### Downstream Utility
Our evaluation protocol follows the standard benchmarks provided by the **Qwen2.5-Math** repository.
* **Base Repository:** [https://github.com/QwenLM/Qwen2.5-Math/tree/main](https://github.com/QwenLM/Qwen2.5-Math/tree/main)

### Perceived Quality (Reward Scoring)
To reproduce the "Perceived Quality" analysis presented in the paper, we provide the scoring script used to evaluate data samples:

* `scripts/reward_scoring.py`: The script utilizes the **Qwen2.5-72B-reward** model to compute reward scores for the generated reasoning paths.

## 4. Prompts (`prompts/`)

We provide the exact system prompts used for our data synthesis pipeline:

* `prompts/filter.txt`: The prompt used for the initial conservative filtering of the SLM-RFT data.
* `prompts/refinement.txt`: The prompt used for standard Oracle Refinement.
* `prompts/style_aligned_refinement.txt`: The prompt used for our proposed Style-Aligned Refinement.