Private Retrieval Augmented Generation with Random Projection

Dixi Yao; Tian Li

Private Retrieval Augmented Generation with Random Projection

Dixi Yao, Tian Li

Published: 05 Mar 2025, Last Modified: 14 Apr 2025BuildingTrustEveryoneRevisionsBibTeXCC BY 4.0

Track: Tiny Paper Track (between 2 and 4 pages)

Keywords: Differential Privacy; Large Language Model; Retrieval-Augmented Generation

Abstract: Large Language Models (LLMs) have gained widespread interest and driven advancements across various fields. Retrieval-Augmented Generation (RAG) enables LLMs to incorporate domain-specific knowledge without retraining. However, evidence shows that RAG poses significant privacy risks due to leakage of sensitive information stored in the retrieval database. In this work, we propose a private randomized mechanism to project both the queries and the datastore into a lower-dimensional space using Gaussian matrices, while preserving the similarities for effective retrieval. Empirical evaluation on different RAG architectures demonstrates that our solution achieves strong empirical privacy protection with negligible impact on generation performance and latency compared to prior methods.

Submission Number: 33

Loading