Private Retrieval Augmented Generation via Random Projection

Private Retrieval Augmented Generation via Random Projection

ACL ARR 2026 January Submission1655 Authors

30 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Retrieval Augmented Generation, Privacy, Large Language Model

Abstract: Retrieval-Augmented Generation (RAG) enhances the capabilities of large language models (LLMs) by querying external structured knowledge. However, it can also introduce privacy risks by leaking sensitive information from the retrieval database. We propose a simple method to method to preserve datastore privacy in RAG systems via random projection. By applying the same projection to both datastore embeddings and query embeddings, our method provably preserves semantic similarity between queries and retrieved items while substantially mitigating data extraction attacks. Across multiple RAG architectures and datasets, we show that this lightweight approach achieves superior retrieval and generation performance compared to prior methods with formal differential privacy (DP) guarantees, while exhibiting comparable empirical privacy under strong attack models. Our results for the first time suggest that random projection can serve as a competitive and practical baseline for privacy-preserving RAG systems.

Paper Type: Short

Research Area: Retrieval-Augmented Language Models

Research Area Keywords: Language Modeling, Information Retrieval and Text Mining

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 1655

Loading