Abstract: Retrieval Augmented Generation (RAG) has been a recent improvement in providing recent
and accurate data to Large Language Models (LLMs). Although RAG has been successful in reducing hallucinations within LLMs, it remains susceptible to inaccurate and maliciously manipulated data. In this
paper, we present Distributed-RAG (D-RAG), a novel blockchain-based framework designed to increase
the integrity of the RAG system. D-RAG addresses the risks of malicious data by replacing the RAG’s
traditionally centralized database with communities, each consisting of a database and a permissioned
blockchain. The communities are based on different subjects, each containing experts in the field who
verify data through a privacy-preserving consensus protocol before it is added to the database. A Retrieval
Blockchain is also designed to communicate between the multiple communities. The miners on this Retrieval Blockchain are responsible for retrieving documents from the database for each query and ranking
them using an LLM. These rankings are agreed upon, and the top ranked documents are provided to the
LLM with the query to generate a response. We perform experiments on our proposed D-RAG framework,
and our results show that our Retrieval Blockchain is scalable and our privacy-preserving consensus protocol maintains efficiency as community members increase. These results demonstrate that in a real-world
application setting D-RAG is scalable in maintaining data integrity
Loading