A Statistical Physics of Language Model Reasoning

Published: 01 Jul 2025, Last Modified: 11 Jul 2025ICML 2025 R2-FM Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Transformer Interpretability, Stochastic Dynamics, Regime Switching, Chain-of-Thought Reasoning, AI safety
TL;DR: This paper models transformer reasoning as a stochastic dynamical system with regime-switching, tracking adversarial belief shifts and failure modes to enable AI safety analysis and jailbreak detection.
Abstract: Transformer LMs show emergent reasoning that resists mechanistic understanding. We offer a statistical physics framework for continuous-time chain-of-thought reasoning dynamics. We model sentence-level hidden state trajectories as a stochastic dynamical system on a lower-dimensional manifold. This drift-diffusion system uses latent regime switching to capture diverse reasoning phases, including misaligned states or failures. Empirical trajectories (8 models, 7 benchmarks) show a rank-40 projection (balancing variance capture and feasibility) explains ~50\% variance; we use this computationally tractable reduction not to claim inherent anisotropy, but to enable feasible SDE parameter estimation. We find four latent reasoning regimes. An SLDS model is formulated and validated to capture these features. The framework enables low-cost reasoning simulation, offering tools to study and predict critical transitions like misaligned states or other LM failures.
Submission Number: 137
Loading