Keywords: Memorization
Abstract: Large language models are known to memorize training data, raising concerns about privacy, copyright, and unintended data leakage. Yet how memorization evolves across the training lifecycle of modern models remains poorly understood. In this work, we analyze memorization dynamics across pre-training, cooldown, and model merging using intermediate checkpoints from the OLMo-2 13B training trajectory. We distinguish between theoretical memorization, measured by sequence likelihood, and practical extractability, measured via prefix-based extraction attacks. Our experiments reveal three key phenomena. First, memorization increases with repetition frequency. Second, we uncover a strong recency bias: data introduced during the final cooldown phase becomes significantly more extractable than earlier data despite fewer total exposures, indicating effective forgetting in token-rich regimes. Third, we identify a merging anomaly: although weight-averaged models exhibit loss values consistent with reduced overfitting, their practical extractability is often higher than that of any individual ingredient model. This divergence shows that extractability and compressibility, typically treated as correlated, can decouple after model merging. Overall, our findings emphasize the need for training-stage-aware evaluation and provide new insights into memorization in modern LLM training pipelines.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 126
Loading