Keywords: Large Reasoning Models, Chain-of-Thought, Tree-of-Thought, Energy Landscapes, Reasoning
Abstract: Since the emergence of large reasoning models (LRMs), reasoning has often been framed as a $\textit{tree-of-thought}$ search, where a model traverses a discrete tree of sub-thoughts within a single chain of thought (CoT), reusing context to perform backtracking and consistency checks. This view presupposes reasoning as a discrete search over symbolic structures. In this work, we challenge this view by proposing a new framework that conceptualizes LRM inference as $\textit{continuous optimization over an implicit energy landscape}$. Here, intermediate representations correspond to positions in a high-dimensional space, and an implicit energy function encodes progress toward the solution. We motivate this perspective by showing that LRMs, unlike standard LLMs, follow smooth trajectories that make steady progress towards a solution rather than making discrete jumps. We identify $\textit{decision tokens}$ as checkpoints where the model $\textit{explicitly estimates energy}$, and chooses to either exploit a local minimum or explore by performing larger updates, akin to basin hopping. We further demonstrate that compared to standard tokens, decision tokens operate at a slower frequency and within a distinct activation subspace, suggesting LRMs employ specialized machinery for planning and verification, analogous to the hierarchical cortical processes underlying human System 2 reasoning. Our framework unifies tree-structured reasoning and energy-based models, suggesting new directions to improve LRMs, such as improving energy estimation at decision tokens or tuning checkpoint frequencies to balance exploration and exploitation.
Primary Area: interpretability and explainable AI
Submission Number: 23731
Loading