Multi-Agent Evolve: LLM Self-Improve through Multi-Agent Co-evolution

Yixing Chen; Yiding Wang; Haofei Yu; Tao Feng; Siqi Zhu; Muhan Zhang; Mostofa Patwary; Jiaxuan You

Multi-Agent Evolve: LLM Self-Improve through Multi-Agent Co-evolution

Yixing Chen, Yiding Wang, Haofei Yu, Tao Feng, Siqi Zhu, Muhan Zhang, Mostofa Patwary, Jiaxuan You

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: large language models, self-improvement, reinforcement learning

Abstract: Reinforcement Learning (RL) has demonstrated significant potential in enhancing the reasoning capabilities of large language models (LLMs). However, the success of RL for LLMs heavily relies on human-curated datasets and verifiable rewards, which limit their scalability and generality. Recent Self-Play RL methods, inspired by the success of the paradigm in games and Go, aim to enhance LLM reasoning capabilities without human-annotated data. However, their methods primarily depend on a grounded environment for feedback (_e.g._, a Python interpreter or a game engine); extending them to general domains remains challenging. To address these challenges, we propose **Multi-Agent Evolve (MAE)**, a framework that enables LLMs to self-evolve in solving diverse tasks, including mathematics, reasoning, and general knowledge Q\&A. The core design of MAE is based on a triplet of interacting agents (_Proposer_, _Solver_, _Judge_) that are instantiated from a single LLM, and applies reinforcement learning to optimize their behaviors. The Proposer generates questions, the Solver attempts solutions, and the Judge evaluates both while co-evolving. Experiments on Qwen2.5-3B-Instruct demonstrate that MAE achieves an average improvement of 4.86\% across multiple benchmarks, surpassing previous methods. These results highlight MAE as a scalable, data-efficient method for enhancing the general reasoning abilities of LLMs with minimal reliance on human-curated supervision.

Primary Area: reinforcement learning

Submission Number: 15538

Loading