How Contaminated Is Your Benchmark? Measuring Dataset Leakage in Large Language Models with Kernel Divergence

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Dataset contamination, where evaluation datasets overlap with pre-training corpora, inflates performance metrics and undermines the reliability of model evaluations. Measuring dataset contamination thus becomes essential to ensure that performance evaluations genuinely reflect a model's ability to generalize to unseen data, rather than relying on memorized examples. To address this problem, we propose Kernel Divergence Score (KDS), a novel method that evaluates dataset contamination by computing the divergence between the kernel similarity matrix of sample embeddings, before and after fine-tuning on the benchmark dataset. Leveraging the insight that fine-tuning affects unseen samples more significantly than seen ones, KDS provides a reliable measure of contamination. Through extensive experiments on controlled contamination scenarios, KDS demonstrates a near-perfect correlation with contamination levels and outperforms existing baselines. Additionally, we perform comprehensive ablation studies to analyze the impact of key design choices, providing deeper insights into the components and effectiveness of KDS. These ablations highlight the importance of leveraging fine-grained kernel-based information and confirm the reliability of the proposed framework across diverse datasets and settings. Code is released in https://github.com/deeplearning-wisc/kernel-divergence-score.
Lay Summary: When we test AI models, it is important to make sure they're solving new problems, and not just memorizing things they have seen before. But sometimes, the test questions (datasets) accidentally include parts the model has already seen during its training. This makes the model look smarter than it really is. To address this issue, we introduce a new method called Kernel Divergence Score (KDS), which measures how much the model’s understanding of data changes before and after learning from a new set of examples. If the model’s view doesn’t change much, it probably already saw those examples before. We tested this method on many benchmarks, confirming that it works reliably compared to previous approaches.
Link To Code: https://github.com/deeplearning-wisc/kernel-divergence-score
Primary Area: Deep Learning->Large Language Models
Keywords: Dataset Leakage, Data Contamination, Large Language Model
Submission Number: 2267
Loading