Keywords: Personalized RAG, Collaborative filtering, Information Retrieval, Clustering, Large language Models.
Abstract: Personalized Retrieval-Augmented Generation (RAG) relies on accurately selecting user-relevant documents. In practice, existing approaches often suffer from high retrieval costs and overlook the fact that collaborative signals from similar users can enhance the personalized generation of the current user. We propose ClusterRAG, a Cluster-Based Collaborative Filtering for Personalized Retrieval-Augmented Generation. ClusterRAG represents users through their profile documents, organizes users into semantically coherent clusters using density-based clustering, and performs retrieval at both the cluster and document levels via cluster-level similarity and fine-grained ranking. Extensive experiments on the LaMP benchmark demonstrate that jointly leveraging the target user’s profile and profiles from top similar users consistently yields the best performance across diverse tasks. Further analysis shows that ClusterRAG integrates seamlessly with different dense retrievers and rankers, and remains effective when paired with both fine-tuned and zero-shot language models.
Paper Type: Long
Research Area: Retrieval-Augmented Language Models
Research Area Keywords: Generation, Information Retrieval and Text Mining, Language Modeling.
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 6748
Loading