Separable Representations of Task Complexity and Deliberation in Reasoning Language Models.
Keywords: Mechanistic interpretability, Activation steering, Reasoning language models, Representation engineering
Abstract: Reasoning-capable large language models (LLMs) produce explicit chain-of-thought (CoT) traces that scale with problem difficulty. While essential for complex reasoning, this behavior introduces unnecessary latency on simpler tasks. Traditional methods for shortening CoT, such as "think fast" prompting, often degrade accuracy on difficult problems. This paper investigates whether a targeted intervention is possible by examining how models internally represent task complexity and allocate reasoning length before generation begins.
We demonstrate that task complexity is encoded with non-linear ordinal structure in pre-reasoning residual-stream activations. PCA visualizations reveal a U-shaped ordinal structure consistent with a continuous complexity gradient, which a non-linear probe can classify above chance. We then extract a "deliberation-associated" direction via contrastive prompting and apply a nullspace projection against confound directions to remove components like instruction compliance and style framing. Using a geometric alignment test, we find no evidence of geometric entanglement between the purified direction and the non-linear structure of complexity assessment. Finally, we test this separation through activation steering: negative steering along the purified direction reduces reasoning tokens by up to 22% on MATH-500 and GPQA-Diamond, with smaller effects on code generation and accuracy preserved within ~1% on most settings. Our results indicate that deliberation extent and task complexity occupy separable representational subspaces, enabling targeted modulation of LLM deliberation.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 201
Loading