Test-Time Scaling for Multistep Reasoning in Small Language Models via A* Search

Alexander Braverman; Weitong Zhang; Quanquan Gu

Test-Time Scaling for Multistep Reasoning in Small Language Models via A* Search

Alexander Braverman, Weitong Zhang, Quanquan Gu

Published: 23 Sept 2025, Last Modified: 22 Nov 2025LAWEveryoneRevisionsBibTeXCC BY 4.0

Keywords: machine learning, large language models, small langauge models, large language model reasoning, hallucination

TL;DR: We propose Test-Time A* Search (TTA*), a framework that equips language models with structured, iterative reasoning via tree-based search. TTA* consistently outperforms zero-shot chain-of-thought on multiple mathematical reasoning tasks.

Abstract: Large language models (LLMs) have demonstrated strong abilities across various tasks but are costly in computation and memory. In contrast, Small Language Models (SLMs) offer significant advantages in efficiency and deployability but usually struggle with complex mathematical reasoning tasks. To tackle this issue, we present the Test-time A* Search (TTA*), a test-time scaling framework that casts reasoning as a goal-directed search over a tree of partial solutions in this paper. TTA* is training-free and requires no external supervision or multi-model structure, making it practical in resource-constrained settings. As a drop-in decoding wrapper for SLMs, TTA* systematically explores, critiques, and refines candidate solution paths via its own self-reflection capability. Extensive experiments on popular mathematical reasoning benchmarks and a variety of base models show that TTA* consistently improves accuracy and robustness, indicating broad applicability to general mathematical reasoning tasks.

Submission Type: Research Paper (4-9 Pages)

Submission Number: 46

Loading