Keywords: Membership Inference Attacks, Privacy, LLMs, Data Extraction Attacks
Abstract: Extracting training data from large language models (LLMs) exposes serious memorization issues and privacy risks. Existing attacks extract data by generations, followed by membership inference. However, extraction attacks do not guide such generations, and the extraction scope of member data is limited to the greedy decoding scheme. Only verbatim memorized member data is being audited in this process. And a majority of member data remains unexplored, even if it is partially memorized. In this work, we define a new notion of memorization, $k$-amendment-completable, to measure the degree of partial memorization. Greedy decoding can only extract
$0$-amendment-completable sequences, which are verbatim memorized. To address the limitation in generation, we propose a membership decoding scheme, which introduces membership information to guide the generation process. We formulate the training data extraction problem as an iterative member token inference problem. The token distribution is calibrated with membership information at each generation step to explore member data. Extensive experiments show that membership decoding can extract novel member data that haven't been studied before. The proposed attack manifests that the privacy risk in LLMs is underestimated.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Submission Number: 25363
Loading