Keywords: language models, reasoning, planning, Supervised Learning, Inference-time scaling
Abstract: Language models generate reasoning sequentially, preventing them from decoupling irrelevant exploration paths during search. We introduce Tree-Structured Language Modeling (TSLM), which uses special tokens to encode branching structure, enabling models to generate and selectively expand multiple search paths within a single generation process. By training on complete search trees including both successful and failed attempts, TSLM learns to internalize systematic exploration without redundant recomputation of shared prefixes. TSLM achieves 100\% accuracy on Game of 24 (vs. 17\% sequential baseline), robust extrapolation to 20×20 grids (91.5\% vs. 42.7\% for Tree-of-Thought), and superior inference efficiency by avoiding the multiple independent forward passes required by external search methods. These results suggest a new paradigm of inference-time scaling for robust reasoning, demonstrating that supervised learning on complete tree-structured traces provides an efficient alternative for developing systematic exploration capabilities in language models.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 21370
Loading