Keywords: language models, reasoning, planning, Supervised Learning, Inference-time scaling
Abstract: Current language models generate solutions through sequential reasoning, limiting their ability to systematically explore multiple solution paths. We introduce Tree-Structured Language Modeling (TSLM), which teaches language models to generate complete search trees within a single generation process using special tokens to encode branching structure. TSLM serializes tree exploration into linear sequences, enabling standard transformer training on tree-structured reasoning traces that capture both successful and failed solution attempts. Across structured planning (Game of 24, Gridworld) and open-ended reasoning tasks (ProntoQA, GSM8K), TSLM achieves superior performance: 100\% accuracy on Game of 24 vs. 17\% for sequential baselines, and robust extrapolation to 20×20 grids (76.5\%) compared to Tree-of-Thought's collapse (26\%). Remarkably, TSLM demonstrates 14× parameter efficiency, with a 0.5B model (68\% scaling performance) outperforming 7B sequential baselines (19-26\%). TSLM also exhibits emergent capabilities including unsolvable problem detection and rapid adaptation with minimal training data. These results challenge the assumption that reinforcement learning is necessary for robust reasoning, demonstrating that supervised learning on complete tree-structured traces provides an efficient alternative for developing systematic exploration capabilities in language models.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 21370
Loading