Mapping Post-Training Forgetting in Language Models at Scale

Published: 26 Jan 2026, Last Modified: 02 Mar 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: continual learning, foundation models, reasoning, forgetting, pretraining knowledge
TL;DR: We quantify forgetting of pretraining knowledge during post-training using simple samplewise metrics -- providing an extensive empirical analysis and open problems in continual learning for foundation models
Abstract: Scaled post‑training now drives many of the largest capability gains in language models (LMs), yet its effect on pretrained knowledge remains poorly understood. Not all forgetting is equal: Forgetting one fact (e.g., a U.S. president or an API call) does not “average out” when recalling another. Hence, we propose a sample-wise paradigm to measure what is forgotten and when backward transfer occurs. Our metric counts 1→0 transitions (correct before post‑training, incorrect after) to quantify forgetting and 0→1 transitions to quantify backward transfer. Traditional task averages conflate these effects and obscure large changes. For multiple‑choice benchmarks, we add chance‑adjusted variants that subtract the expected contribution of random guessing from pre‑ and post‑training accuracies. We apply this framework across post‑training stages, model sizes, and data scales. Our large‑scale analysis across nearly 30 model pairs and 100 sub-benchmarks with up to 32,768 generated tokens per sample shows that: (1) Domain-continual pretraining induces moderate forgetting with low-to-moderate backward transfer; (2) RL/SFT post-training applied to base models and instruction tuning yields moderate-to-large backward transfer on math and logic with overall low-to-moderate forgetting; (3) Applying RL/SFT to instruction‑tuned models is sensitive on data scale: at small scales, both forgetting and backward transfer are small; at larger scales, effects are mixed and warrant further study with better controls; (4) Model merging does not reliably mitigate forgetting. Overall, our framework offers a practical yardstick for mapping how post‑training alters pretrained knowledge at scale -- enabling progress towards generally capable AI systems.
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 6557
Loading