AgentPO: Enhancing Multi-Agent Collaboration via Reinforcement Learning

Published: 26 Jan 2026, Last Modified: 11 Apr 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-Agent Systems, Reinforcement Learning
TL;DR: We propose AgentPO, a reinforcement learning framework that trains a Collaborator agent to optimize multi-agent cooperation.
Abstract: Multi-Agent Systems (MAS) offer a powerful paradigm for solving complex problems through distributed reasoning and collaboration. However, their effectiveness is often hindered by the challenge of optimizing interactions among agents. To address this, we introduce AgentPO, a novel framework that directly optimizes agent collaboration. AgentPO employs reinforcement learning to train a specialized Collaborator agent, which refines its interaction policy to enhance overall system performance within a fixed multi-agent topology. We evaluated AgentPO on multiple mathematical reasoning tasks, where it consistently outperformed strong baselines. With Llama-3.2-3B-Instruct as the actor model, AgentPO achieves accuracy improvements of +1.8\% and +7.2\% over strong baselines like Role Assignment and EvoAgent, respectively. When using the larger Llama-3.1-8B-Instruct model, these gains increase to +5.6\% and +11.3\%. Crucially, AgentPO achieves these results with remarkable efficiency: it requires only 500 training samples and operates at just 7.8\% of EvoAgent's inference cost, highlighting its superior scalability and practicality.
Primary Area: reinforcement learning
Submission Number: 14950
Loading