Keywords: data contamination, post training, supervised fintuning, reinforcement learning
TL;DR: We run a controlled clean-vs-contaminated training + SFT/RL experiment to analyze how post-training changes the impact of contamination; if it does change it significantly, pre-training contamination might be more critical
Abstract: We present a controlled study of how dataset contamination interacts with the post-training stages in large language models. Starting from clean checkpoints of Qwen2.5 (0.5B/1.5B) and Gemma3 (1B/4B), we inject five copies of GSM8k and MBPP test items into the first 2B tokens of an otherwise 25B token extended pre-training dataset. We then compare the contaminated and clean models both immediately after pre-training and again after supervised fine-tuning (SFT) or reinforcement learning (RL). The post-training steps do not have any contamination. Across math and coding benchmarks, we find two consistent patterns: (i) Contamination causes performance spikes that are gradually diminished with continued pre-training. After even 25B tokens, the apparent performance inflation of contamination can become close to zero. (ii) Both SFT and RL resurface the leaked information, but with different patterns: SFT inflates scores only on the contaminated tasks (GSM8k, MBPP), whereas RL also improves performance on uncontaminated counterparts (GSMPlus, HumanEval). Our results underscore the need for contamination audits \emph{after} post-training and suggest that RL-based post-training, although not immune, can help alleviate overestimation problems.
Submission Number: 181
Loading