Latent-Aligned Manifolds in Language Models: Geometry, Recovery, and Prompt Translations

ACL ARR 2026 January Submission1712 Authors

31 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: mechanistic interpretability, representation manifolds, latent state, prompt effects, unsupervised recovery, geometry of representations, large language models
Abstract: Understanding how large language models maintain and manipulate internal task state remains a central challenge in mechanistic interpretability. Sequential tasks are known to induce low-dimensional structure in activation space, yet how task-defined state, representation geometry, and prompt modulation interact remains poorly understood. We introduce a formal framework that links task-induced latent states to latent-aligned manifolds in the residual stream, in which representations concentrate near low-dimensional trajectories reflecting latent state progression. Building on this perspective, we develop Auto-Latent, an unsupervised method that recovers ordered latent-aligned structure directly from activations without access to task rules or handcrafted annotations. Across controlled state-tracking tasks, system-level prompts primarily act as approximately translational offsets on task manifolds, preserving geometric structure while shifting their embedding in representation space. These translation-dominated responses and latent-aligned geometries persist under unsupervised recovery, indicating that prompt modulation and internal task state are governed by intrinsic geometric properties of model computation rather than task-specific annotation artifacts.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: probing, knowledge tracing/discovering/inducing, robustness
Contribution Types: Model analysis & interpretability, Theory
Languages Studied: English
Submission Number: 1712
Loading