Keywords: Large Language Models, Membership Inference Attacks, Data, Privacy, Benchmark
TL;DR: We provide a missing testbed for membership inference attacks against pre-training data for LLMs.
Abstract: We introduce a simple and rigorous testbed for membership inference attacks (MIA) against pre-training sequences for large language models (LLMs).
Our testbed addresses the following gaps in existing evaluations, which lack:
(1) \textit{uniform} sampling of member/non-member documents of varying lengths from pre-training shards;
(2) large-scale \textit{deduplication} at varying strengths, both within and across the sampled members/non-members; and
(3) rigorous \textit{statistical tests} to detect member/non-member distribution shifts that cause faulty evaluations and are otherwise imperceptible to the heuristic techniques used in prior work.
We provide both global- and domain-level datasets (e.g., Reddit, Stack Exchange, Wikipedia), derived from fully-open pre-trained LLM/dataset pairs including Pythia/Pile, Olmo/Dolma, and our custom pre-trained GPT-2-Large on FineWeb-Edu.
We additionally open source a modular and extensible codebase that facilitates the creation of custom, statistically validated, and deduplicated evaluation data using future open models and datasets.
In sum, our work is a concrete step towards addressing the evaluation issues discussed by prior work.
Submission Number: 87
Loading