Automatic Feature Manifold Discovery in LLMs via Supervised Multi-Dimensional Scaling

TMLR Paper6137 Authors

07 Oct 2025 (modified: 28 Nov 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The linear representation hypothesis states that language models (LMs) encode concepts as directions in their latent space, forming organized, multidimensional manifolds. Prior efforts focus on discovering specific geometries for specific features, and thus lack generalization. We introduce Supervised Multi-Dimensional Scaling (SMDS), a model-agnostic method to automatically discover feature manifolds. We apply SMDS to temporal reasoning as a case study, finding that different features form various geometric structures such as circles, lines, and clusters. SMDS reveals many insights on these structures: they consistently reflect the properties of the concepts they represent; are stable across model families and sizes; actively support reasoning in models; and dynamically reshape in response to context changes. Together, our findings shed light on the functional role of feature manifolds, supporting a model of entity-based reasoning in which LMs encode and transform structured representations.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Following the reviewer's suggestions, we have: - Clarified visualizations and K-Fold setting used to obtain the stress scores throughout the paper; - Performed an analysis on variability of our dataset compared to previous studies to validate our design (Section 4); - Moved evidence for the findings from Appendix to main paper; - Performed a statistical rank analysis on the manifolds across all models (Figure 5, Section 5.1); - Toned down the claims on logarithmic spacing (Section 5.1); - Clarified the role of the semantic flow phenomenon introduced in Section 5.2 and further described in 5.4; - Clarified the choice of layers for the intervention experiment (Section 5.3); - Updated the duration 2D manifold plot to better show the two features, and added supporting stress values (Section 5.4); All relevant changes are shown in red.
Assigned Action Editor: ~Serguei_Barannikov1
Submission Number: 6137
Loading