System-2 Mathematical Reasoning via Enriched Instruction Tuning

Huanqia Cai; Yijun Yang; Zhifeng Li

System-2 Mathematical Reasoning via Enriched Instruction Tuning

Huanqia Cai, Yijun Yang, Zhifeng Li

Published: 05 Sept 2025, Last Modified: 05 Sept 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Solving complex mathematical problems via system-2 reasoning is a natural human skill, yet it remains a significant challenge for current large language models (LLMs). We identify the scarcity of deliberate multi-step reasoning data as a primary limiting factor. To this end, we introduce Enriched Instruction Tuning (EIT), a method that enriches existing human-annotated mathematical datasets by augmenting human-annotated data with AI-generated feedback to create fine-grained reasoning trajectories. These datasets are then used to fine-tune open-source LLMs, enhancing their mathematical reasoning abilities without reliance on any symbolic verification program. Concretely, EIT is composed of two critical steps: Enriching with Reasoning Plan (ERP) and Enriching with Reasoning Step (ERS). The former generates a high-level plan that breaks down complex instructions into a sequence of simpler objectives, while ERS fills in reasoning contexts often overlooked by human annotators, creating a smoother reasoning trajectory for LLM fine-tuning. Unlike existing CoT prompting methods that generate reasoning chains only depending on LLM's internal knowledge, our method leverages human-annotated initial answers as ``meta-knowledge'' to help LLMs generate more detailed and precise reasoning processes, leading to a more trustworthy LLM expert for complex mathematical problems. In experiments, EIT achieves an accuracy of 84.1% on GSM8K and 32.5% on MATH, surpassing state-of-the-art fine-tuning and prompting methods, and even matching the performance of tool-augmented methods.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: We extend our sincere gratitude to the reviewers and the action editor for their valuable comments and constructive suggestions. In response to their feedback, we have revised the manuscript as follows: ● We have provided further clarification in Section 3.1 on the process used to verify the output fidelity of the EIT-generated data. ● Additional experimental details have been included in Section 6.2 regarding the control over output granularity for analysis, as well as in Section 6.3 concerning the computation of the perplexity metric. ● A new quantitative analysis has been added in Appendix A.3 to demonstrate the effectiveness of our EIT method across varying levels of difficulty. ● The description of our method has been refined to more accurately characterize it as a form of data augmentation rather than an interactive collaboration. We believe that these revisions have substantially enhanced the clarity and rigor of the paper. Once again, we thank the reviewers and the action editor for their insightful input, which has greatly improved our work.

Assigned Action Editor: ~Hongsheng_Li3

Submission Number: 4135

Loading