Tracing the Representation Geometry of Language Models from Pretraining to Post-training

Melody Zixuan Li; Kumar Krishna Agrawal; Arna Ghosh; Komal Kumar Teru; Adam Santoro; Guillaume Lajoie; Blake Aaron Richards

Tracing the Representation Geometry of Language Models from Pretraining to Post-training

Melody Zixuan Li, Kumar Krishna Agrawal, Arna Ghosh, Komal Kumar Teru, Adam Santoro, Guillaume Lajoie, Blake Aaron Richards

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Phases of learning, Representation Learning, Learning Dynamics, Representation Geometry, Spectral Analysis, Memorization, Generalization, Pretraining, Post-training

TL;DR: While loss decreases monotonically during LLM training, the representations undergo distinct geometric phases across pretraining and post-training, which in turn determine when and how the model acquires memorization or generalization capabilities.

Abstract: Standard training metrics like loss fail to explain the emergence of complex capabilities in large language models. We take a spectral approach to investigate the geometry of learned representations across pretraining and post-training, measuring effective rank (RankMe) and eigenspectrum decay (αReQ). With OLMo (1B-7B) and Pythia (160M-12B) models, we uncover a consistent non-monotonic sequence of three geometric phases during autoregressive pretraining. The initial “warmup” phase exhibits rapid representational collapse. This is followed by an “entropy-seeking” phase, where the manifold’s dimensionality expands substantially, coinciding with peak n-gram memorization. Subsequently, a “compression-seeking” phase imposes anisotropic consolidation, selectively preserving variance along dominant eigendirections while contracting others, a transition marked with significant improvement in downstream task performance. We show these phases can emerge from a fundamental interplay of cross-entropy optimization under skewed token frequencies and representational bottlenecks (d ≪ |V|). Post-training further transforms geometry: SFT and DPO drive “entropy-seeking” dynamics to integrate specific instructional or preferential data, improving in-distribution performance while degrading out-of-distribution robustness. Conversely, RLVR induces “compression-seeking” , enhancing reward alignment but reducing generation diversity.

Supplementary Material: zip

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 19300

Loading