Self-Improving Mathematical Reasoning of Large Language Models with a Code-Centric Paradigm

Self-Improving Mathematical Reasoning of Large Language Models with a Code-Centric Paradigm

ACL ARR 2024 June Submission3556 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: There is a growing trend of teaching large language models (LLMs) to solve mathematical problems through coding during the problem-solving process. Existing studies primarily focus on distilling powerful, closed-source models and in-domain data augmentation, equipping LLMs with a considerable capacity for mathematical reasoning via coding. However, the self-improvement of such LLMs through leveraging large-scale, expert-written, diverse math question-answer pairs remains under-explored. To bridge the gap and tackle challenges such as code response assessment, we propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation. We also explore different alignment algorithms with self-generated instruction/preference data to foster continuous improvements. Experiments across both in-domain (up to $+5.7\%$) and out-of-domain ($+4.4\%$) benchmarks in English and Chinese demonstrate the effectiveness of self-improving LLMs with the proposed paradigm.

Paper Type: Long

Research Area: NLP Applications

Research Area Keywords: mathematical NLP, evaluation, NLP datasets, math QA, applications

Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources

Languages Studied: English, Chinese

Submission Number: 3556

Loading