Leveraging Set Assumption for Membership Inference in Language Models

Xinxi Lyu; Ari Holtzman; Niloofar Mireshghallah; Yanai Elazar; Sewon Min; Hannaneh Hajishirzi; Pradeep Dasigi

Leveraging Set Assumption for Membership Inference in Language Models

Xinxi Lyu, Ari Holtzman, Niloofar Mireshghallah, Yanai Elazar, Sewon Min, Hannaneh Hajishirzi, Pradeep Dasigi

28 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large language models, membership Inference, pretraining data

TL;DR: By assuming all documents in a set are either entirely included or excluded from the training data, we introduces a novel membership inference method, Set-MI, that significantly enhances prior methods.

Abstract: Membership Inference (MI) refers to the task of determining whether or not a document is included in the training data of a given model. MI provides an effective post-training alternative for analyzing training datasets when the access to them is restricted, including studying the impact of data choices on downstream performance, detecting copyrighted content in the training sets, and checking for evaluation set contamination. However, black-boxed Language Models (LMs) only providing the loss for the document may not provide a reliable signal for determining memberships. In this work, we leverage the insight that documents sharing certain attributes (e.g., time of creation) are all expected to be in a training set or none of them is, and develop methods that aggregate membership predictions over these documents. We apply our set assumption on five different domains (e.g., Wikipedia, Arxiv), and find that our method enhances prior MI methods by 0.14 in AUROC on average. We further analyze the impact of different language model sizes, training data deduplication, and methods of aggregating membership predictions over sets and find that our method is more effective on undeduplicated and larger models with more documents available in each set and longer sequence sampled for each document, and show our method’s robustness against noises in the set assumption under practical settings.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 13241

Loading