Self-Improving Mathematical Reasoning of Large Language Models with a Code-Centric Paradigm

ACL ARR 2024 June Submission3556 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: There is a growing trend of teaching large language models (LLMs) to solve mathematical problems through coding during the problem-solving process. Existing studies primarily focus on distilling powerful, closed-source models and in-domain data augmentation, equipping LLMs with a considerable capacity for mathematical reasoning via coding. However, the self-improvement of such LLMs through leveraging large-scale, expert-written, diverse math question-answer pairs remains under-explored. To bridge the gap and tackle challenges such as code response assessment, we propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation. We also explore different alignment algorithms with self-generated instruction/preference data to foster continuous improvements. Experiments across both in-domain (up to $+5.7\%$) and out-of-domain ($+4.4\%$) benchmarks in English and Chinese demonstrate the effectiveness of self-improving LLMs with the proposed paradigm.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: mathematical NLP, evaluation, NLP datasets, math QA, applications
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English, Chinese
Submission Number: 3556
Loading