Private Retrieval Augmented Generation via Random Projection

ACL ARR 2026 January Submission1655 Authors

30 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Retrieval Augmented Generation, Privacy, Large Language Model
Abstract: Retrieval-Augmented Generation (RAG) enhances the capabilities of large language models (LLMs) by querying external structured knowledge. However, it can also introduce privacy risks by leaking sensitive information from the retrieval database. We propose a simple method to method to preserve datastore privacy in RAG systems via random projection. By applying the same projection to both datastore embeddings and query embeddings, our method provably preserves semantic similarity between queries and retrieved items while substantially mitigating data extraction attacks. Across multiple RAG architectures and datasets, we show that this lightweight approach achieves superior retrieval and generation performance compared to prior methods with formal differential privacy (DP) guarantees, while exhibiting comparable empirical privacy under strong attack models. Our results for the first time suggest that random projection can serve as a competitive and practical baseline for privacy-preserving RAG systems.
Paper Type: Short
Research Area: Retrieval-Augmented Language Models
Research Area Keywords: Language Modeling, Information Retrieval and Text Mining
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 1655
Loading