Detecting Training Data of Large Language Models via Expectation Maximization

ACL ARR 2025 May Submission6824 Authors

20 May 2025 (modified: 04 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Membership inference attacks (MIAs) aim to determine whether a specific example was used to train a given language model. While prior work has explored prompt-based attacks such as ReCALL, these methods rely heavily on the assumption that using known non-members as prompts reliably suppresses the model’s responses to non-member queries. We propose EM-MIA, a new membership inference approach that iteratively refines prefix effectiveness and membership scores using an expectation-maximization strategy without requiring labeled non-member examples. To support controlled evaluation, we introduce OLMoMIA, a benchmark that enables analysis of MIA robustness under systematically varied distributional overlap and difficulty. Experiments on WikiMIA and OLMoMIA show that EM-MIA outperforms existing baselines, particularly in settings with clear distributional separability. We highlight scenarios where EM-MIA succeeds in practical settings with partial distributional overlap, while failure cases expose fundamental limitations of current MIA methods under near-identical conditions. We will release our code and evaluation pipeline upon publication to encourage reproducible and robust MIA research.
Paper Type: Long
Research Area: Machine Learning for NLP
Research Area Keywords: large language models, membership inference attack, data contamination detection, memorization, unsupervised learning, resources and evaluation
Contribution Types: NLP engineering experiment, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English
Submission Number: 6824
Loading