Keywords: Reasoning dynamics, Hidden Markov Model, Transition analysis, Large language models, Interpretability.
Abstract: Reasoning in language models involves both explicit steps in the generated text and implicit structural shifts in hidden states, yet their joint dynamics remain largely underexplored. We introduce a Explicit–Implicit Reasoning Lens (EIRL) that jointly models these dimensions: at the explicit stage, EIRL captures transitions between reasoning roles, and at the implicit stage, it models latent depth regimes that reveal how computation is allocated across layers within each role. By linking what function a reasoning step serves to where it arises in the network, our approach provides a unified lens for both understanding reasoning dynamics and the underlying mechanisms. Once trained on reasoning trajectories, the EIRL learns probabilistic transition patterns through hidden Markov modeling that characterize how models typically move between reasoning roles and allocate computation across layers. Our analysis reveals a clear internal-to-external progression in reasoning. At the implicit stage, hidden states organize into distinct depth patterns that differ across reasoning categories, indicating that the model allocates its layers differently depending on the functional role of the step. These internal configurations then give rise to the explicit stage, where the model expresses its reasoning through semantic transitions. This progression diverges between trajectories that succeed and those that fail to reach the correct answer. Leveraging the explicit–implicit reasoning structure captured by EIRL, our framework supports both causal interventions that steer models toward targeted reasoning paths and interpretability analyses that reveal how different external intervention strategies reorganize the semantic flow of reasoning to produce their observed effects.
Supplementary Material: zip
Primary Area: interpretability and explainable AI
Submission Number: 21865
Loading