Keywords: LLM, Parallel Reasoning, Sequential Reasoning, Test-time Scaling
Abstract: Test-time scaling has emerged as a critical driver for advancing Large Language Model (LLM) reasoning, yet current approaches remain bifurcated between sequential scaling and parallel scaling. Sequential methods often struggle with fixed token budgets, leading to premature halting or verbosity, while parallel methods typically lack inter-path coordination. To bridge this gap, we propose SEAT (Semantic Entropy-Guided Adaptive Termination), a training-free framework that synergizes the benefits of both paradigms. Specifically, SEAT adopts a hybrid architecture that simultaneously explores multiple reasoning paths while sequentially feeding results from the previous round into the next to refine the generation process. Our approach is grounded in the observation that Semantic Entropy (SE) strongly correlates negatively with model accuracy, serving as a reliable proxy for reasoning quality. SEAT leverages this signal to dynamically control the reasoning process, employing a novel threshold-free termination mechanism inspired by the "Secretary Problem" in Optimal Stopping Theory to eliminate pre-sampling overhead. Extensive evaluations across five challenging reasoning benchmarks demonstrate that SEAT significantly boosts performance. Furthermore, our adaptive approach effectively prevents semantic entropy collapse found in smaller 7B models, ensuring robust multi-round reasoning.
Paper Type: Long
Research Area: Mathematical, Symbolic, Neurosymbolic, and Logical Reasoning
Research Area Keywords: Mathematical reasoning, scaling, inference methods
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 4662
Loading