Keywords: Large Language Models, Instruction Tuning, Long-Chain-of-Thought (Long-CoT) Distillation, Singular Value Decomposition, Structural Changes in LLMs
TL;DR: Post-training in large language models induces consistent singular value scaling and orthogonal transformations of singular vectors, suggesting it acts as a reparameterization of invariant subspaces in the pretrained parameter space.
Abstract: Post-training fundamentally alters the behavior of large language models (LLMs), yet its impact on the internal parameter space remains poorly understood. In this work, we conduct a systematic singular value decomposition (SVD) analysis of principal linear layers in pretrained LLMs, focusing on two widely adopted post-training methods: *instruction tuning* and *long-chain-of-thought (Long-CoT) distillation*. Our analysis reveals two consistent and unexpected structural changes:**(1) a near-uniform geometric scaling of singular values across layers**, which theoretically modulates attention scores; and **(2) highly consistent orthogonal transformations are applied to the left and right singular vectors of each matrix.** Disrupting this orthogonal consistency leads to catastrophic performance degradation. Based on these findings, we propose a simple yet effective framework that interprets post-training as a reparameterization of fixed subspaces in the pretrained parameter space. Further experiments reveal that singular value scaling behaves as a secondary effect, analogous to a temperature adjustment, whereas the core functional transformation lies in the coordinated rotation of singular vectors. These results challenge the prevailing view of the parameter space in large models as a black box, uncovering the first clear regularities in how parameters evolve during training, and providing a new perspective for deeper investigation into model parameter changes.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 12861
Loading