SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models

Dian Yu; Baolin Peng; Ye Tian; Linfeng Song; Tao Ge; Haitao Mi; Dong Yu

SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models

Dian Yu, Baolin Peng, Ye Tian, Linfeng Song, Tao Ge, Haitao Mi, Dong Yu

27 Sept 2024 (modified: 16 Dec 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Mathematical Reasoning, Code Generation

TL;DR: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models

Abstract: There is a growing trend of teaching large language models (LLMs) to solve mathematical problems through coding. Existing studies primarily focus on prompting powerful, closed-source models to generate seed training data followed by in-domain data augmentation, equipping LLMs with considerable capabilities for code-assisted mathematical reasoning. However, continually training these models on augmented data derived from a few datasets such as GSM8K may impair their generalization abilities and restrict their effectiveness to limited question types. Conversely, the potential of improving such LLMs by leveraging large-scale, expert-written, diverse math question-answer pairs remains unexplored. To utilize these resources and tackle unique challenges such as code response assessment, we propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation. We also explore different alignment algorithms with self-generated instruction/preference data to foster continuous self-improvement. Experiments across both in-distribution (up to $+5.7\%$) and out-of-distribution ($+4.4\%$) benchmarks in English and Chinese show the effectiveness of the proposed paradigm.

Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 11247

Loading