It's all in the heads: An investigation of domain knowledge infusion into LLMs

It's all in the heads: An investigation of domain knowledge infusion into LLMs

ICLR 2026 Conference Submission13125 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Continued Pre-Training, Singular Value Decomposition, Linear Mode Connectivity, Domain Knowledge

Abstract: While large language models (LLMs) are widely studied, the mechanisms by which they internalize knowledge from specialized domains remain poorly understood. To investigate this, we analyze the Continual Pre-Training (CPT) paradigm, where a base model is further pre-trained on a curated, domain-specific corpus. Through a focused study on mathematical data, we uncover two key properties of this process: (1) domain connectivity between checkpoints trained on different CPT datasets, and (2) head-wise sparsity in the model increment that encodes new domain knowledge. We further support these findings with a spectral analysis of weight matrices at different lengths of pre-training stage before and after CPT, and investigate applicability of the heavy-tailed self-regularization theory to modern large language models. To foster further research, we provide an open-source scalable toolkit for performing spectral analysis on models with billions of parameters - NetInspect. The code is available at https://anonymous.4open.science/r/netinspect-EF67

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 13125

Loading