Hypothesis-Driven Feature Manifold Analysis in LLMs via Supervised Multi-Dimensional Scaling

Hypothesis-Driven Feature Manifold Analysis in LLMs via Supervised Multi-Dimensional Scaling

TMLR Paper6137 Authors

07 Oct 2025 (modified: 10 Feb 2026)Decision pending for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: The linear representation hypothesis states that language models (LMs) encode concepts as directions in their latent space, forming organized, multidimensional manifolds. Prior work has largely focused on identifying specific geometries for individual features, limiting its ability to generalize. We introduce Supervised Multi-Dimensional Scaling (SMDS), a model-agnostic method for evaluating and comparing competing feature manifold hypotheses. We apply SMDS to temporal reasoning as a case study and find that different features instantiate distinct geometric structures, including circles, lines, and clusters. SMDS reveals several consistent characteristics of these structures: they reflect the semantic properties of the concepts they represent, remain stable across model families and sizes, actively support reasoning, and dynamically reshape in response to contextual changes. Together, our findings shed light on the functional role of feature manifolds, supporting a model of entity-based reasoning in which LMs encode and transform structured representations.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: Following the reviewer's suggestions, we have: - Clarified visualizations and K-Fold setting used to obtain the stress scores throughout the paper; - Performed an analysis on variability of our dataset compared to previous studies to validate our design (Section 4); - Moved evidence for the findings from Appendix to main paper; - Performed a statistical rank analysis on the manifolds across all models (Figure 5, Section 5.1); - Toned down the claims on logarithmic spacing (Section 5.1); - Clarified the role of the semantic flow phenomenon introduced in Section 5.2 and further described in 5.4; - Clarified the choice of layers for the intervention experiment (Section 5.3); - Updated the duration 2D manifold plot to better show the two features, and added supporting stress values (Section 5.4); Following the Action Editor's requests, we have: - Rephrased "Feature Manifold Discovery" as "Feature Manifold Analysis" and made more clear the scope of SMDS lies in evaluating specific geometric hypotheses; - Added references to RTD AE and RTD-Lite AE in the related works; - Established stronger statistical guarantees via bootstrapping, identifying a single statistically significant manifold for all datasets of the study (save for date_temperature whose case we discuss). The statistical analysis and its Confidence Intervals are discussed in the paper and in a new appendix; - Added a study on sensitivity to dimensionality $m$; - Expanded the discussion on how statistical significance can be used as a decision rule for best manifold. In cases where a single best manifold cannot be identified, we recommend developing new hypotheses and running further causal studies; All relevant changes are shown in red (for the reviewers' changes) or blue (for the action editor's changes).

Code: https://github.com/UKPLab/tmlr2026-manifold-analysis

Assigned Action Editor: ~Serguei_Barannikov1

Submission Number: 6137

Loading