Modeling Hierarchical Thinking in Large Reasoning Models

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 spotlightEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: We model and steer LLM reasoning as outcome-aligned trajectories over FSM states, enabling sentence-level inference time control via latent interventions.
Abstract: Large Reasoning Models (LRMs) solve complex tasks by generating long Chain-of-Thought (CoT) sequences; however, the emergent dynamics governing reasoning trajectories are not well understood and can lead to inconsistencies and reasoning pathologies. In this work, we propose to approximate LRM's emerging hierarchical reasoning dynamics as a trajectory within a Finite State Machine (FSM) transitioning among six abstract cognitive states. We demonstrate that these states and transitions can be captured in the latent state of the model. We believe that this representation can have different applications in the interpretability and optimization of LRM models. For example, by analyzing the topology of these transitions, we identify statistical shifts in reasoning strategies that help identify effective reasoning chains from those that fail. To illustrate these potential advantages, we propose $Q$-Value guided steering, a training-free inference-time control method that treats reasoning as a planning problem. We estimate the long-horizon utility of state transitions and apply sparse, orthogonal activation steering at sentence boundaries to align the CoT generation with optimal reasoning policies. Experiments across four benchmarks (AIME25, MATH-500, GSM8k, and GPQA Diamond) using three state-of-the-art open reasoning models demonstrate that $Q$-Value steering policy achieves significant performance gains with "surgical'' efficiency, often requiring $25\times$ fewer interventions than greedy and weighted baselines, which suggests that reasoning can be effectively controlled by guiding high-level cognitive dynamics rather than micro-managing token generation. Code is available at: https://github.com/shahariar-shibli/CoT-FSM.
Lay Summary: When Large Reasoning Models (LRMs) solve hard problems like math competitions or scientific questions, they "think out loud" — generating long sequences of reasoning steps before arriving at an answer. These thinking sequences can go off track in subtle ways, and we currently have little understanding of *why* some reasoning paths succeed while others fail. We propose modeling the internal reasoning process of a LRM as a simple map with six cognitive ''states'': *setting up the problem*, *deducing step-by-step*, *introducing new strategies*, *expressing uncertainty*, *backtracking*, and *reaching a conclusion*. By tracking how a LRM *moves* between these states, we can build a statistical picture of *which transitions* tend to lead to correct answers versus wrong ones. Using this map, we developed a steering method that gently nudges the model toward better reasoning moves — but only at the most critical moments, using long-term planning rather than constant intervention. On challenging math and science benchmarks, our method boosts accuracy while intervening up to 25 times less frequently than simpler alternatives — showing that smarter, targeted guidance of high-level thinking is far more effective than constant low-level corrections.
Primary Area: Deep Learning->Large Language Models
Keywords: reasoning, hierarchical thinking, finite state abstraction, reasoning dynamics, activation steering, Q-value policy, representation engineering, thinking, CoT, Chain of Thought
Originally Submitted PDF: pdf
Submission Number: 28662
Loading