How Do Large Language Models Learn Concepts During Continual Pre-Training?

How Do Large Language Models Learn Concepts During Continual Pre-Training?

ACL ARR 2026 January Submission8516 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Learning Dynamics, Mechanistic Interpretability, LLM Knowledge Acquisition, Continual Pre-Training

Abstract: Human beings primarily understand the world through concepts (e.g., $\textit{dog}$), abstract mental representations that structure perception, reasoning, and learning. However, how large language models (LLMs) acquire, retain, and forget such concepts during continual pretraining remains poorly understood. In this work, we study how individual concepts are acquired and forgotten, as well as how multiple concepts interact through interference and synergy. We further link these behavioral dynamics to LLMs’ internal $\textbf{Concept Circuits}$, computational subgraphs associated with specific concepts, and incorporate $\textbf{Graph Metrics}$ to characterize circuit structure. Our analysis reveals that: (1) LLMs concept circuits provides a non-trivial, statistically significant signal of concept learning and forgetting; (2) Concept circuits exhibit a stage-wise temporal pattern during continual pretraining, with an early increase followed by gradual decrease and stabilization; (3) concepts with larger learning gains tend to exhibit greater forgetting under subsequent training; (4) semantically similar concepts induce stronger interference than weakly related ones; and (5) pretraining on one knowledge type can facilitate learning of another, with highly directional and uneven benefits across ordered pairs. Together, our findings offer a circuit-level view of concept learning dynamics and inform the design of more interpretable and robust concept-aware training strategies for LLMs.

Paper Type: Long

Research Area: Language Models

Research Area Keywords: pre-training,continual learning,knowledge tracing,probing

Contribution Types: Model analysis & interpretability

Languages Studied: English

Submission Number: 8516

Loading