Evading Data Contamination Detection for Language Models is (too) Easy

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models, model evaluation, malicious actors
TL;DR: We analyze current contamination detection methods and find a significant vulnerability in their assumptions that can be easily exploited by malicious actors.
Abstract: The benchmark performance of large language models (LLMs) has a high impact on their popularity and is thus of great importance to many model providers. However, the reliability of such benchmark scores as a measure of model quality gets compromised if the model is contaminated with benchmark data. While recent contamination detection methods try to address this issue, they overlook the possibility of deliberate contamination by malicious model providers aiming to evade detection. We propose a categorization of model providers based on their (de)contamination practices and argue that malicious contamination is of crucial importance as it casts doubt on the reliability of public benchmarks. To study this issue more rigorously, we analyze current contamination detection methods based on their assumptions. This analysis reveals a significant vulnerability in existing approaches: they do not account for rephrased benchmark data used during training by malicious actors. We demonstrate how exploiting this gap can result in significantly inflated benchmark scores while completely evading current detection methods.
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9617
Loading