Keywords: Test time scaling, MCTS, LLM
Abstract: As the scaling of large language models (LLMs) during training reaches diminishing returns due to increased resource requirements and limited data availability, focus has shifted toward scalable test-time algorithms. Chain-of-Thought (CoT) reasoning, which enables intermediate reasoning steps in text space, has emerged as a promising approach. However, CoT’s \textbf{single-path exploration} is susceptible to biases and underexploration of the solution space in complex problems. This survey examines advancements in tree search-based methods for enhancing LLM test-time reasoning. Beginning with foundational search algorithms like depth-first search (DFS) and breadth-first search (BFS), we trace the evolution to heuristic-guided approaches and ultimately Monte Carlo Tree Search (MCTS). We introduce \textbf{a unified framework} for comparing these methods, focusing on their core designs, reasoning reward formulations, and targeted applications. Our analysis highlights MCTS's capability to balance exploration and exploitation, overcoming limitations of traditional inference methods like beam search. This survey establishes a foundation for advancing scalable test-time reasoning in LLMs, with implications for improving general-purpose reasoning capabilities.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 1202
Loading