Block-Level Weight-Space Structure Persists Under Post-Training: An Empirical Study Across LLM Families

Zhaohui Geoffrey Wang

Block-Level Weight-Space Structure Persists Under Post-Training: An Empirical Study Across LLM Families

Zhaohui Geoffrey Wang

Published: 24 May 2026, Last Modified: 02 Jun 2026ICML 2026 Workshop WSS PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Weight-Space Geometry, Model Symmetry, Post-Training Dynamics, Transformer Structure, Parameter Perturbation, Mode Connectivity, Representation Invariance, Model Compression, Weight Sharing, Neural Network Geometry

TL;DR: Post-training perturbs all parameters but preserves block-level geometry, revealing a structured weight-space invariance that enables efficient model sharing.

Abstract: Modern LLMs are deployed as families of post-trained variants (base, instruct, chat, code) derived from a shared set of pre-trained weights. We present an empirical study of how post-training transforms weight-space geometry across eight configurations spanning four architecture families (Qwen2.5, Llama-3.1/3.2, Mistral, Gemma-2). We identify a granularity gap: post-training modifies every tensor (zero of 291–339 tensors remain byte-identical, so hash-based deduplication yields 0% savings), yet preserves block-level structure (mean cosine similarity >0.99 and relative Frobenius distance <0.13). Post-training therefore acts as a structured perturbation that shifts all parameters while preserving block-level geometry. This property is not universal: independently trained variants (e.g., Qwen2.5-Coder) exhibit much lower similarity (~0.64), indicating a disconnected region of weight space. Perturbation magnitude further varies systematically with model scale, architecture, and training recipe. As a practical application, we build LinkerLLM, a lazy loader that exploits block-level persistence to share weights across co-resident variants, achieving 18–48% GPU memory savings and enabling up to five 7B variants on a single 24GB GPU. Five of eight configurations retain ≥94% of original quality across standard benchmarks.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 64

Loading