Keywords: Transformers, Tree Search, Teacher Forcing, Reasoning
Abstract: Tree-structured reasoning can improve language models by exploring multiple intermediate thoughts, but the branching, scoring, and backtracking logic is usually supplied by an external search wrapper. We ask whether this search controller can instead be realized within an autoregressive Transformer itself. We formalize internal tree-search execution as teacher-forced next-token prediction over tokenized search trajectories and study three controlled settings: greedy search on explicit trees, reward-ordered depth-first search on explicit trees, and DFS control over implicit trees generated by a fixed proposal front end. Across these settings, we construct softmax-Transformer controllers that implement the required primitives, including branch comparison, visited-state detection, forward selection, backtracking, and mode routing, under explicit separation and rounding conditions. We further provide finite-sample excess-risk bounds for norm-bounded Transformer classes and show, for explicit trees, that low teacher-forced control risk implies successful rounded autoregressive rollout. These results show that the core control primitives of tree-structured reasoning are representationally and statistically realizable inside Transformer architectures.
Submission Number: 183
Loading