Abstract: A large amount of work has been done in Multi-Agent Systems (MAS) for modeling and solving problems with multiple interacting agents. However, most LLMs are pretrained
independently and not specifically optimized for coordination. For example, existing LLM fine-tuning frameworks rely
on individual rewards, which require complex reward designs for each agent to encourage collaboration. To address
this challenge, we model LLM collaboration as a cooperative Multi-Agent Reinforcement Learning (MARL) problem. We develop a multi-agent, multi-turn algorithm, MultiAgent Group Relative Policy Optimization (MAGRPO), to
solve it, building on current RL approaches for LLMs as
well as MARL techniques. Our experiments on LLM writing
and coding collaboration demonstrate that fine-tuning multiple LLMs with MAGRPO enables agents to generate high-quality responses efficiently through effective cooperation.
Our approach opens the door to using MARL methods for
LLM collaboration and highlights the associated challenges.
Loading