Tracing the complexity profiles of different linguistic phenomena through the intrinsic dimension of LLM representations

ACL ARR 2026 January Submission101 Authors

21 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM analysis, intrinsic dimensionality, linguistic complexity
Abstract: We explore the intrinsic dimension (ID) of LLM representations as a marker of linguistic complexity, asking if different ID profiles across LLM layers differentially characterize formal and functional complexity. We find the formal contrast between sentences with multiple coordinated or subordinated clauses to be reflected in ID differences whose onset aligns with a phase of more abstract linguistic processing independently identified in earlier work. The functional contrasts between sentences characterized by right branching vs. center embedding or unambiguous vs. ambiguous relative clause attachment are also picked up by ID, but in a less marked way, and they do not correlate with the same processing phase. Further experiments using representational similarity and layer ablation confirm the same trends. We conclude that ID is a useful marker of linguistic complexity in LLMs, that it allows to differentiate between different types of complexity, and that it points to similar stages of linguistic processing across disparate LLMs.
Paper Type: Long
Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics
Research Area Keywords: Language Modeling, Linguistic Theories, Cognitive Modeling, and Psycholinguistics, Syntax: Tagging, Chunking and Parsing
Contribution Types: Model analysis & interpretability, Data resources
Languages Studied: English
Submission Number: 101
Loading