Fluid Reasoning Representations

Dmitrii Kharlapenko; Alessandro Stolfo; Arthur Conmy; Mrinmaya Sachan; Zhijing Jin

Fluid Reasoning Representations

Dmitrii Kharlapenko, Alessandro Stolfo, Arthur Conmy, Mrinmaya Sachan, Zhijing Jin

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, Reasoning, Chain-of-Thought, Mechanistic Interpretability, Steering, Abstract Reasoning, Planning, BlocksWorld

TL;DR: Reasoning models like QwQ-32B progressively adapt their internal representations during extended reasoning to develop abstract, symbolic encodings that enable better performance on obfuscated planning tasks.

Abstract: Traditional large language models struggle with abstract reasoning tasks. By generating extended chains of thought, reasoning models such as OpenAI's o1 and o3 show dramatic accuracy improvements. However, the internal transformer mechanisms underlying this superior performance remain poorly understood. This work presents an early mechanistic analysis of how reasoning models process abstract structural information during extended reasoning. We analyze QwQ-32B on Mystery BlocksWorld -- a semantically obfuscated benchmark that measures planning and reasoning capabilities. We find that QwQ gradually improves its internal understanding of actions and concepts through its extended rollouts, developing abstract representations that focus on structure rather than specific action names. Through steering experiments, we establish causal evidence that these adaptations improve problem solving: injecting refined representations from successful traces enhances accuracy, while symbolic representations can replace many specific Mystery BlocksWorld-obfuscated encodings with minimal performance loss. We therefore find that one of the factors driving reasoning model performance is in-context refinement of token representations -- which we call Fluid Reasoning Representations. This provides early mechanistic interpretability into reasoning models.

Primary Area: interpretability and explainable AI

Submission Number: 22953

Loading