A Missing Testbed for LLM Pre-Training Membership Inference Attacks

Published: 06 Mar 2025, Last Modified: 30 Apr 2025ICLR 2025 Workshop Data Problems PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Membership Inference Attacks, Data, Privacy, Benchmark
TL;DR: We provide a missing testbed for membership inference attacks against pre-training data for LLMs.
Abstract: We introduce a simple and rigorous testbed for membership inference attacks (MIA) against pre-training sequences for large language models (LLMs). Our testbed addresses the following gaps in existing evaluations, which lack: (1) \textit{uniform} sampling of member/non-member documents of varying lengths from pre-training shards; (2) large-scale \textit{deduplication} at varying strengths, both within and across the sampled members/non-members; and (3) rigorous \textit{statistical tests} to detect member/non-member distribution shifts that cause faulty evaluations and are otherwise imperceptible to the heuristic techniques used in prior work. We provide both global- and domain-level datasets (e.g., Reddit, Stack Exchange, Wikipedia), derived from fully-open pre-trained LLM/dataset pairs including Pythia/Pile, Olmo/Dolma, and our custom pre-trained GPT-2-Large on FineWeb-Edu. We additionally open source a modular and extensible codebase that facilitates the creation of custom, statistically validated, and deduplicated evaluation data using future open models and datasets. In sum, our work is a concrete step towards addressing the evaluation issues discussed by prior work.
Submission Number: 87
Loading