MARGE: Improving Math Reasoning with Guided Exploration

Jingyue Gao; Runji Lin; Keming Lu; Bowen Yu; Junyang Lin; Jianyu Chen

MARGE: Improving Math Reasoning with Guided Exploration

Jingyue Gao, Runji Lin, Keming Lu, Bowen Yu, Junyang Lin, Jianyu Chen

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models (LLMs) exhibit strong potential in mathematical reasoning, yet their effectiveness is often limited by a shortage of high-quality queries. This limitation necessitates scaling up computational responses through self-generated data, yet current methods struggle due to spurious correlated data caused by ineffective exploration across all reasoning stages. To address such challenge, we introduce **MARGE**: Improving **Ma**th **R**easoning with **G**uided **E**xploration, a novel method that enhances mathematical reasoning through hit-guided exploration. MARGE systematically explores intermediate reasoning states derived from self-generated solutions, enabling adequate exploration and improved credit assignment throughout the reasoning process. Notably, MARGE improves both single-shot accuracy and exploration diversity, mitigating a common trade-off in alignment methods. These results demonstrate MARGE's effectiveness in enhancing mathematical reasoning capabilities and unlocking the potential of scaling self-generated training data.

Lay Summary: How do we improve LLMs' reasoning ability, as high-quality queries and solutions are scarce? One natural way is to ask LLMs to generate more responses for post-training. However, finding effective training data on multi-step tasks like math is hard due to a long reasoning chain. The main idea behind MARGE is to use the guidance of an existing solution to boost exploration and improve credit assignment. By completing intermediate states, the models obtain a larger training set with lower spurious correlations, which enables the scaling in self-training pipelines. More surprisingly, our method improves reasoning accuracy and diversity, indicating that unexplored patterns are found during exploration. Our research demonstrates the benefits and importance of exploration for future LLM reasoning and post-training studies.

Link To Code: https://github.com/georgao35/MARGE

Primary Area: Deep Learning->Large Language Models

Keywords: LLM, Reasoning, Reinforcement Learning, Exploration

Submission Number: 1912

Loading