Keywords: Large language models, membership Inference, pretraining data
TL;DR: By assuming all documents in a set are either entirely included or excluded from the training data, we introduces a novel membership inference method, Set-MI, that significantly enhances prior methods.
Abstract: Membership Inference (MI) refers to the task of determining whether or not a document is included in the training data of a given model. MI provides an effective post-training alternative for analyzing training datasets when the access to them is restricted, including studying the impact of data choices on downstream performance, detecting copyrighted content in the training sets, and checking for evaluation set contamination. However, black-boxed Language Models (LMs) only providing the loss for the document may not provide a reliable signal for determining memberships. In this work, we leverage the insight that documents sharing certain attributes (e.g., time of creation) are all expected to be in a training set or none of them is, and develop methods that aggregate membership predictions over these documents. We apply our set assumption on five different domains (e.g., Wikipedia, Arxiv), and find that our method enhances prior MI methods by 0.14 in AUROC on average. We further analyze the impact of different language model sizes, training data deduplication, and methods of aggregating membership predictions over sets and find that our method is more effective on undeduplicated and larger models with more documents available in each set and longer sequence sampled for each document, and show our method’s robustness against noises in the set assumption under practical settings.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 13241
Loading