**Note:** This attachment is only for partial feature demonstration; the final code and data will be open-sourced in the future. Most of the project consists of executable Python files or shell scripts that can be run directly.

**Workflow Overview**

Prepare a set of original math reasoning problems in JSON format.

Go to _diff/irt/_, which contains all code for IRT parameter estimation. Use _api_openai.py_ or _api_zhipu.py_ (specifying an output folder) to obtain large-model responses via API calls, then run _get_reponse.py_ to aggregate that folder’s files into a response-matrix JSONL file.

Use _vae.py_ and _sampling.py_ to perform data augmentation based on the response-matrix JSONL produced in the previous step.

Run _run_irt.sh_ to invoke _py-irt_ for parameter estimation. After difficulty parameters are estimated, run _embedding.py_ to compute embedding vectors for the problems.

Train a difficulty ranker with a multilayer perceptron by running _rank.py_; this provides the reward signal used in _rl/rewrite_reward.py_.

**SFT and RL**

The _rl/_ directory contains our SFT + RL code. Use data_process to produce data in the correct format for SFT or RL.

_run_sft.sh_ is a training script that calls the _llama-factory cli_ (refer to the official documentation).

After SFT, set the SFT model path and run _train_gspo.py_ to start reinforcement learning training.

**Evaluation and Generation**

In _exp/_, _api_openai.py_ and _api_zhipu.py_ can be used for model evaluation.

_qa_model.py_ provides a method to evaluate a local model.

_rule_base/data_generation.py_ implements our rule-based perturbation scheme inspired by the GSM-PLUS paper.

_rule_base/win.py_ contains model-based judging code that compares our perturbed problems with the rule-based ones.