Advancing Multi-Agent RAG system with Minimalist Reinforcement Learning

Published: 19 Dec 2025, Last Modified: 05 Jan 2026AAMAS 2026 FullEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-Agent, RAG, LLMs, Reinforcement Learning
TL;DR: Multi-Agent RAG system with Minimalist Reinforcement Learning
Abstract: Large Language Models (LLMs) equipped with modern Retrieval-Augmented Generation (RAG) systems often employ multi-turn interaction pipelines to interface with search engines for complex reasoning tasks. However, such multi-turn interactions inevitably produce long intermediate contexts, as context length grows exponentially with exploration depth. This leads to a well-known limitation of LLMs: their difficulty in effectively leveraging information from long contexts. This problem is further amplified in RAG systems that depend on in-context learning, where few-shot demonstrations must also be included in the prompt, compounding the context-length bottleneck. To address these challenges, we propose \textbf{Mujica-MyGo}, a unified framework for efficient multi-turn reasoning in RAG. Inspired by the divide-and-conquer principle, we introduce \textbf{Mujica} (Multi-hop Joint Intelligence for Complex Question Answering), a multi-agent RAG workflow that decomposes multi-turn interactions into cooperative sub-interactions, thereby mitigating long-context issues. To eliminate the dependency on in-context learning, we further develop \textbf{MyGO} (Minimalist Policy Gradient Optimization), a lightweight and efficient reinforcement learning algorithm that enables effective post-training of LLMs within complex RAG pipelines. We provide theoretical guarantees for MyGO’s convergence to the optimal policy. Empirical evaluations across diverse question-answering benchmarks—covering both text corpora and knowledge graphs—show that \textbf{Mujica-MyGO} achieves superior performance.
Area: Generative and Agentic AI (GAAI)
Generative A I: I acknowledge that I have read and will follow this policy.
Submission Number: 837
Loading