DECOR: Learning to Decompose and Collaborate in Deep Search via Multi-Agent Reinforcement Learning

Ruiqing Chen; Zekun Zhang; Gong-Duo Zhang; Lihong Gu; Lin Zhou

DECOR: Learning to Decompose and Collaborate in Deep Search via Multi-Agent Reinforcement Learning

Ruiqing Chen, Zekun Zhang, Gong-Duo Zhang, Lihong Gu, Lin Zhou

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: DECOR is a MARL framework for deep search that jointly trains Planner, Filter, and Answerer agents. By optimizing collaboration via hybrid rewards, it outperforms static chains on seven reasoning benchmarks.

Abstract: Monolithic agents in deep search often suffer from "cognitive overload," while existing multi-agent approaches mostly rely on frozen models that cannot learn from collaboration failures. To bridge this gap, we propose $\textbf{DECOR}$ ($\textbf{DE}$compose and $\textbf{CO}$llaborate via $\textbf{R}$ole-specialized agents), a framework formulating deep search as a Multi-Agent Reinforcement Learning (MARL) problem. DECOR functionally decomposes the task into three specialized roles: a $\textit{Planner}$ to navigate, a $\textit{Filter}$ to curate a noise-reduced memory, and an $\textit{Answerer}$ for synthesis. Unlike training-free orchestration, we jointly optimize these agents using a hybrid reward strategy that harmonizes role-specific intrinsic feedback with team-level outcome signals. Experiments on seven benchmarks show that DECOR significantly outperforms strong monolithic baselines, demonstrating the necessity of learning-based functional decomposition in handling cognitive overload.

Lay Summary: When artificial intelligence (AI) tries to answer complex questions by searching the internet, it often gets overwhelmed. Forcing a single AI program to simultaneously plan searches, read dozens of articles, and deduce the final answer causes "cognitive overload." Distracted by irrelevant information, the AI gets confused and starts making things up. To solve this, we created DECOR, a system that splits this massive task among a team of three specialized AI agents: a "Navigator" to steer the search, a "Librarian" to filter out junk, and a "Writer" to draft the final answer. Instead of giving them rigid instructions, we trained this AI team using a reward system, allowing them to learn from their mistakes and figure out how to collaborate. By having the Librarian hide irrelevant noise, the Writer stays focused purely on the facts. Our experiments show this collaborative team is vastly more accurate than a single, multitasking AI. This approach paves the way for highly reliable AI research assistants that can dig through massive amounts of online information to find the truth without getting distracted.

Primary Area: Deep Learning->Large Language Models

Keywords: reinforcement learning in agents, multi-agent systems, LLM agents

Originally Submitted PDF: pdf

Submission Number: 6650

Loading