Detecting Data Contamination in LLMs via In-Context Learning

Published: 24 Sept 2025, Last Modified: 24 Sept 2025NeurIPS 2025 LLM Evaluation Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM, Contamination, In-context learning
TL;DR: We propose Contamination Detection via Context (CoDeC), a simple, efficient, and model-agnostic method that detects training data contamination in LLMs by measuring how in‑context examples affect predictions.
Abstract: We present Contamination Detection via Context (CoDeC), a simple and accurate method to detect and quantify training data contamination in large language models. CoDeC distinguishes between data memorized during training and data outside the training distribution by measuring how in-context learning affects model performance. We find that appropriate in‑context examples typically boosts confidence for unseen datasets but may reduce it when the dataset was part of training, due to disrupted memorization patterns. Experiments show that CoDeC produces interpretable contamination scores that clearly separate seen and unseen datasets, and reveals strong evidence of memorization in open-weight models with undisclosed training corpora. The method is automated, parameter-free, and both model- and dataset-agnostic, making it easy to integrate with benchmark evaluations.
Submission Number: 104
Loading