Adaptive Monte Carlo Tree Search for High-Quality Process Supervision in Mathematical Reasoning

ACL ARR 2026 January Submission9872 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Mathematical Reasoning, Large Language Model Reasoning
Abstract: The quality of process data plays a key role in training a Process Reward Model (PRM), which can enhance the complex mathematical reasoning capability of large language models. Existing methods rely on vanilla Monte Carlo Tree Search (MCTS) to obtain process labels, which limits their flexibility in node value estimation and path expansion. To address this issue, we propose Adaptive MCTS (AMCTS), a framework that transforms data generation from a fixed, static process to an adaptive, dynamic one at the level of node value estimation and path expansion. On one hand, AMCTS adaptively refines value estimation by dynamically allocating more samples to uncertain reasoning steps and fewer to confident ones. On the other hand, it enhances the path expansion with a temporally adaptive policy that begins with broad exploration and gradually shifts toward exploiting the most promising directions. With AMCTS, we construct a large-scale dataset \texttt{MathSearch-200K} of about 200K process supervision examples. To evaluate the effectiveness and superiority of AMCTS, we conduct comprehensive experiments across two key dimensions: data generation and mathematical reasoning. In data generation, AMCTS produces higher-quality training samples with fewer rollouts than two vanilla MCTS baselines. In mathematical reasoning, a PRM trained on \texttt{MathSearch-200K} using four LLM-based actor models consistently outperforms existing baselines across four benchmarks, achieving up to a 6.9\% absolute improvement on MATH500 when paired with \texttt{Llama-3.2-3B-Instruct}. Notably, these gains persist on out-of-distribution benchmarks, demonstrating strong generalization capability. Our code is available at https://anonymous.4open.science/r/AMCTS-7DB4.
Paper Type: Long
Research Area: Mathematical, Symbolic, Neurosymbolic, and Logical Reasoning
Research Area Keywords: Mathematical,  Machine Learning for NLP, Language Modeling
Contribution Types: Approaches low compute settings-efficiency, Data resources
Languages Studied: English
Submission Number: 9872
Loading