Lost in Projection: The Geometric Orthogonality of LLM Assessment and Reasoning

ACL ARR 2026 January Submission8001 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models, mechanistic interpretability, representation engineering, reasoning
Abstract: Large language models (LLMs) frequently exhibit a dissociation between their internal confidence and actual reasoning competence. We investigate the mechanistic origin of this phenomenon by analyzing the geometry of residual stream activations across two phases: \textit{pre-generative assessment} and \textit{solution execution}. Using linear probing and principal component analysis across three model families (Llama, Qwen, Mistral), we identify two distinct geometric structures: a high-dimensional \textbf{Assessment Subspace} that encodes solvability beliefs, and a low-dimensional \textbf{Execution Subspace} that governs reasoning dynamics. While the belief state is robustly decodable during prompt processing, we observe a sharp \textbf{dimensionality shift} at the onset of generation, where the active variance transitions to the low-dimensional execution manifold. We validate this decoupling through causal intervention: steering vectors applied to the assessment subspace induce a decisive shift in the internal belief state ($\Delta > 0.8$), yet fail to alter downstream reasoning accuracy. Crucially, this inertness persists even on ``borderline'' tasks where the model possesses the requisite capability to solve the problem. These findings suggest a modular architecture where high-level assessment states are geometrically orthogonal to the procedural dynamics of execution, explaining why increasing internal confidence does not translate to improved competence.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: Interpretability and Analysis of Models, Reasoning, Safety and Reliability of LLMs, Representational Geometry
Contribution Types: Model analysis & interpretability
Languages Studied: English
Submission Number: 8001
Loading