Keywords: Chain of Thought/Reasoning models, Causal interventions, Steering
TL;DR: Reasoning models like QwQ-32B progressively adapt their internal representations during extended reasoning to develop abstract, symbolic encodings that enable better performance on obfuscated planning tasks.
Abstract: Classic large language models struggle with abstract reasoning tasks, while reasoning models like OpenAI's o1 and o3 that generate extended chains of thought achieve dramatic improvements. However, the mechanisms underlying this superior performance remain poorly understood. This work presents a mechanistic analysis of how reasoning models process abstract structural information during extended reasoning. We analyze QwQ-32B on Mystery BlocksWorld - a semantically obfuscated planning domain
and found that reasoning models progressively refine their internal representations of actions and predicates throughout 15-20k token traces, converging toward abstract symbolic encodings independent of surface semantics.
Through steering experiments, we established causal evidence that these adaptations improve problem-solving: injecting refined representations from successful traces enhances accuracy, while symbolic representations can replace naming-specific encodings with minimal performance loss. Our findings reveal that reasoning models' superior performance stems partly from their ability to dynamically construct problem-specific representational spaces during extended reasoning, providing early mechanistic insights into chains of thought.
Submission Number: 143
Loading