ASE-MCTS: Asynchronous Self-Evaluated Monte Carlo Tree Search for Efficient LLM Reasoning

ASE-MCTS: Asynchronous Self-Evaluated Monte Carlo Tree Search for Efficient LLM Reasoning

ACL ARR 2026 January Submission4873 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, LLM Reasoning, Monte Carlo Tree Search, Asynchronous Search

Abstract: While Monte Carlo Tree Search (MCTS) effectively enhances LLM reasoning, its widespread application is constrained by the unreliability of intermediate rewards and prohibitive time costs. Prevailing methods typically secure high-quality signals via external supervision, while intrinsic self-evaluation often suffers from noise and unreliability, limiting the scalability and broad applicability of tree search. To address these challenges, we propose ASE-MCTS, an \textbf{A}synchronous \textbf{S}elf-\textbf{E}valuated \textbf{MCTS} framework. To improve reward robustness, we introduce a fused reward mechanism that integrates diverse signals, including pairwise comparisons, to enhance both global alignment and local discriminability. Simultaneously, we utilize a novel asynchronous orchestrator that parallelizes independent expansion and simulation tasks, effectively masking generation latency. Extensive experiments demonstrate that ASE-MCTS consistently outperforms strong baselines and achieves capability breakthroughs on high-difficulty benchmarks without external supervision. Moreover, our asynchronous design significantly improves computational efficiency, yielding up to a $6\times$ speedup over synchronous implementations.

Paper Type: Long

Research Area: Mathematical, Symbolic, Neurosymbolic, and Logical Reasoning

Research Area Keywords: Language Modeling

Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models

Languages Studied: English

Submission Number: 4873

Loading