When Evolution Meets Momentum: Orchestrating Goal-oriented and Process-oriented reasoning for LLM Inference Scaling

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Inference Scaling, Momentum, Reasoning, LLM Agent and Human
TL;DR: Mixing MCTS and Policy Evolution in the LLM search, and borrow the idea of Momentum in Optimization to prevent from local optimal
Abstract: Large language models (LLMs) have demonstrated strong reasoning ability when given additional compute at inference time. However, existing inference-time scaling methods are fundamentally limited by their design. On the one hand, gaol-oriented approaches, such as Line or Tree Search, refine candidate solutions using feedback but are vulnerable to sequential dependence, often collapsing into suboptimal reasoning trajectories. On the other hand, process-driven approaches such as Best-of-N sampling encourage diversity through random exploration but lack feedback mechanisms, leading to inefficient computation allocation and unguided search. In this work, we propose EvoMo, a novel inference-time scaling approach that unifies both paradigms by embedding a globally evolving strategy pool into MCTS, where each node expansion selects reasoning strategies under an $\varepsilon$-soft policy. To further avoid stagnation in familiar strategies, we introduce a \textit{momentum-based optimization} mechanism that monitors similarity among generated solutions and encourages the exploration of underutilized strategies. Across benchmarks, EvoMo reveals significant performance gains over SOTA inference scaling methods.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 10215
Loading