Keywords: memorization, llms, membership inference attacks, copyright, detecting pretraining data
Abstract: Large language models (LLMs) have been shown to reproduce copyrighted text, e.g., passages from news articles, sparking high-profile lawsuits against their providers. Yet copyright claims often rely on anecdotal evidence. To strengthen such cases, there is a clear need for automated, interpretable, and reliable methods to detect repetitions that are also suitable in legal proceedings. Current methods often equate an LLM’s ability to complete prefixes from source passages with memorization, but this metric fails when strong completions arise from generalization. We identify four key criteria for practical real-world deployment: methods must operate with only black-box access to the LLM, while remaining reliable (high precision), efficient, and interpretable. We first investigate prior work and find simple non-memorized counterexamples that trigger false positives even in advanced methods designed to account for generalization, undermining their reliability in legal contexts. Going further, we introduce DualTest, which requires only a single API call per passage under test and produces human-interpretable results. Our key insight is to leverage a small proxy model to disentangle memorization from generalization in counterexamples, improving recall by up to 3\% compared to the strongest baseline.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 16062
Loading