Large language model (LLM)-based applications have increasingly leveraged retrieval-augmented generation (RAG) techniques to provide reliable responses, particularly for queries demanding knowledge of private domains. Practical constraints, such as data sovereignty regulations, can hinder the centralized aggregation of private knowledge. This can create challenges in situations where (1) a user comes with a question but has no idea which applications have the related knowledge to answer, or (2) the question requires cross-domain knowledge to answer.
In this work, we abstract each RAG application with private knowledge as an RAG-based agent. We propose \algo, a framework with an efficient and accurate routing mechanism and an iterative refining-solving mechanism to orchestrate multiple RAG-based agents with private knowledge bases. The server routes queries to the most relevant agents by identifying the most related knowledge clusters by similarities in a vector space. For complicated questions, the server can iteratively aggregate responses to derive intermediate results and refine the question to bridge the gap toward a comprehensive answer. Extensive experiments demonstrate the effectiveness of \algo, including how our routing algorithm precisely selects the agents and provides accurate responses to single-hop queries and how an iterative strategy achieves accurate, multi-step resolutions for complex queries.