Thank you for taking the time to review our paper: Probing to Refine: Reinforcement Distillation of LLM Reasoners via Explanatory Inversion.

Here we provide the code for running our framework.

To run the SFT process, use the following command:

bash train_augment_distill_sft.sh

After SFT, run the following commands for ExGRPO: 

bash vllm_server.sh
bash train_multiturn_grpo.sh

