Language Models May Verbatim Complete Text They Were Not Explicitly Trained On

Ken Liu; Christopher A. Choquette-Choo; Matthew Jagielski; Peter Kairouz; Sanmi Koyejo; Percy Liang; Nicolas Papernot

Language Models May Verbatim Complete Text They Were Not Explicitly Trained On

Ken Liu, Christopher A. Choquette-Choo, Matthew Jagielski, Peter Kairouz, Sanmi Koyejo, Percy Liang, Nicolas Papernot

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 spotlightposterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Under $n$-gram definitions of train-set inclusion, LLMs can complete “unseen” texts—both after data deletion and adding “gibberish” data. Our results impact unlearning, membership inference & data transparency.

Abstract: An important question today is whether a given text was used to train a large language model (LLM). A completion test is often employed: check if the LLM completes a sufficiently complex text. This, however, requires a ground-truth definition of membership; most commonly, it is defined as a member based on the n-gram overlap between the target text and any text in the dataset. In this work, we demonstrate that this n-gram based membership definition can be effectively gamed. We study scenarios where sequences are non-members for a given n and we find that completion tests still succeed. We find many natural cases of this phenomenon by retraining LLMs from scratch after removing all training samples that were completed; these cases include exact duplicates, near-duplicates, and even short overlaps. They showcase that it is difficult to find a single viable choice of n for membership definitions. Using these insights, we design adversarial datasets that can cause a given target sequence to be completed without containing it, for any reasonable choice of n. Our findings highlight the inadequacy of n-gram membership, suggesting membership definitions fail to account for auxiliary information available to the training algorithm.

Lay Summary: What exactly do we mean by "training set inclusion" under language models? A vast body work---across research, policy, and even lawsuits---has implicitly converged on definitions based on $n$-gram (substring) overlap. That is, a piece of text is considered a "member" of the training set, if some span of that text (n-gram) can be found in the training set. This paper is a tale of two experiments that demonstrates the fundamental limitations of all $n$-gram based membership definitions. We ask two questions from the lens of (verbatim) text completion with a language model: 1. **Deletion:** can we *prevent* the verbatim generation of a text by deleting all of its n-grams and retraining the model from scratch? The answer is no! Many deleted texts can still be generated verbatim by the retrained LLM. 2. **Addition:** can we *cause* the verbatim generation of a text by training on texts with no n-gram overlap? The answer is yes! And it only takes a few gradient steps of fine-tuning. The key message of this work is that data membership in LLMs extends beyond set membership of text in the raw dataset; it also encompasses data neighborhoods (“soft membership”) due to LLM generalization, data provenance, preprocessing, and other auxiliary information that the training algorithm gets access to throughout the ML pipeline. Many subfields, such as copyright, unlearning, membership inference, and data transparency, require a membership definition, and our work shows overly simplistic notions of membership hinder progress in these areas.

Primary Area: Deep Learning->Large Language Models

Keywords: Training data membership, data completion, data reconstruction, membership inference, unlearning, privacy, training set inclusion, copyright

Submission Number: 2670

Loading