BoostStep: Boosting Mathematical Capability of Large Language Models via Step-aligned In Context Learning
Keywords: Mathematical Reasoning, Large Language Models, In-context Learning
TL;DR: BoostStep integrates a step-aligned in-context learning mechanism, effectively enhancing the math reasoning performance of SOTA reasoning models like GPT-4o and DeepSeek-R1.
Abstract: Large language models (LLMs) have demonstrated impressive ability in solving complex mathematical problems with multi-step reasoning and can be further enhanced with well-designed in-context learning (ICL) examples. However, this potential is often constrained by two major challenges in ICL: granularity mismatch and irrelevant information.
We observe that while LLMs excel at decomposing mathematical problems, they often struggle with reasoning errors in fine-grained steps. Moreover, ICL examples retrieved at the question level may omit critical steps or even mislead the model with irrelevant details.
To address this issue, we propose BoostStep, a method that enhances reasoning accuracy through step-aligned ICL, a novel mechanism that carefully aligns retrieved reference steps with the corresponding reasoning steps. Additionally, BoostStep incorporates an effective "first-try" strategy to retrieve for exemplars highly relevant to the current state of reasoning.
BoostStep is a flexible and powerful method that integrates seamlessly with chain-of-thought (CoT) and tree search algorithms, refining both candidate selection and decision-making. Empirical results show that BoostStep improves GPT-4o’s CoT performance by 4.6\% across mathematical benchmarks, significantly surpassing traditional few-shot learning's 1.2\%. Moreover, it can achieve an additional 7.5\% gain combined with tree search. Surprisingly, it enhances state-of-the-art LLMs to solve challenging math problems using simpler examples. It improves DeepSeek-R1-671B and Qwen3-235B’s performance on AIME by 2.2\% and 5.0\% respectively, leveraging simple examples only from the MATH dataset.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 3297
Loading