Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems

Published: 04 Mar 2024, Last Modified: 02 May 2024DPFM 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Retrieval-Augmented Generation, Privacy, Security
Abstract: Retrieval-Augmented Generation (RAG) improves Language Models (LMs) by incorporating external knowledge at test time to enable customized adaptation. We study the risk of datastore leakage in Retrieval-In-Context based RAG systems. We show that an adversary can exploit LMs' instruction-following capabilities to easily extract text data verbatim from the datastore of RAG systems built with instruction-tuned LMs via prompt injection. The vulnerability exists for a wide range of modern LMs that span Llama2, Mistral/Mixtral, Vicuna, SOLAR, WizardLM, Qwen1.5, and Platypus2, and the exploitability exacerbates as the model size scales up. Extending our study to production models GPTs, we design an attack that can cause datastore leakage with a 100\% success rate on 25 randomly selected customized GPTs within at most 2 queries, and we show that with only 100 questions generated by GPT-4, one can attack GPTs to extract 36\% text data verbatim from a book of 77,000 words.
Submission Number: 41
Loading