Contamination Detection for VLMs Using Multi‑Modal Semantic Perturbations

Published: 26 Jan 2026, Last Modified: 26 Feb 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Vision Language Models, Data Contamination
TL;DR: We devise a novel contamination detection method for vision language models.
Abstract: Recent advances in Vision–Language Models (VLMs) have achieved state-of-the-art performance on numerous benchmark tasks. However, the use of internet-scale, often proprietary, pretraining corpora raises a critical concern for both practitioners and users: inflated performance due to \emph{test-set leakage}. While prior works have proposed mitigation strategies such as decontamination of pretraining data and benchmark redesign for LLMs, the complementary direction of developing detection methods for \emph{contaminated VLMs} remains underexplored. To address this gap, we deliberately contaminate open-source VLMs on popular benchmarks and show that existing detection approaches either fail outright or exhibit inconsistent behavior. We then propose a novel simple yet effective detection method based on \textit{multi-modal semantic perturbation}, demonstrating that contaminated models fail to generalize under controlled perturbations. Finally, we validate our approach across multiple realistic contamination strategies, confirming its robustness and effectiveness. The code and perturbed dataset are released here: \href{https://github.com/jadenpark0/mm-perturb}{https://github.com/jadenpark0/mm-perturb}.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 15475
Loading