Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems

Zhenting Qi; Hanlin Zhang; Eric P. Xing; Sham M. Kakade; Himabindu Lakkaraju

Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems

Zhenting Qi, Hanlin Zhang, Eric P. Xing, Sham M. Kakade, Himabindu Lakkaraju

Published: 04 Mar 2024, Last Modified: 02 May 2024DPFM 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Retrieval-Augmented Generation, Privacy, Security

Abstract: Retrieval-Augmented Generation (RAG) improves Language Models (LMs) by incorporating external knowledge at test time to enable customized adaptation. We study the risk of datastore leakage in Retrieval-In-Context based RAG systems. We show that an adversary can exploit LMs' instruction-following capabilities to easily extract text data verbatim from the datastore of RAG systems built with instruction-tuned LMs via prompt injection. The vulnerability exists for a wide range of modern LMs that span Llama2, Mistral/Mixtral, Vicuna, SOLAR, WizardLM, Qwen1.5, and Platypus2, and the exploitability exacerbates as the model size scales up. Extending our study to production models GPTs, we design an attack that can cause datastore leakage with a 100\% success rate on 25 randomly selected customized GPTs within at most 2 queries, and we show that with only 100 questions generated by GPT-4, one can attack GPTs to extract 36\% text data verbatim from a book of 77,000 words.

Submission Number: 41

Loading