Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Weizhen Li; Jianbo Lin; Zhuosong Jiang; Jingyi Cao; Xinpeng Liu; Jiayu Zhang; Huang Zhenqiang; Qianben Chen; Weichen Sun; Qiexiang Wang; Hongxuan Lu; Tianrui Qin; ChenghaoZhu; Yi Yao; Shuying Fan; LiXiaowan; Tiannan Wang; Pai Liu; King Zhu; He Zhu; Dingfeng Shi; WANG PIAOHONG; Yeyi Guan; Xiangru Tang; Minghao Liu; Yuchen Eleanor Jiang; Jian Yang; Jiaheng Liu; Ge Zhang; Wangchunshu Zhou

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

15 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: agent foundation models, agent frameworks, LLMs

TL;DR: Chain-of-Agents (CoA) distills multi-agent systems into a single end-to-end LLM paradigm, enabling efficient agent-like problem solving and achieving state-of-the-art results with open-sourced Agent Foundation Models (AFMs).

Abstract: Recent advances in large language models (LLMs) and multi-agent systems have demonstrated remarkable capabilities in complex problem-solving tasks such as deep research, vibe coding, and mathematical reasoning. However, most existing multi-agent systems are built upon manual prompt/workflow engineering with sophisticated agent frameworks, making them computationally inefficient, less capable, and unable to benefit from data-centric learning. In this work, we introduce Chain-of-Agents (CoA), a novel paradigm of LLM reasoning that enables native end-to-end complex problem-solving in the same way as a multi-agent system (i.e., multi-turn problem solving with multiple tools and multiple agents) within one model. In chain-of-agents problem-solving, the model dynamically activates different tool agents and role-playing agents to simulate multi-agent collaboration in an end-to-end fashion. To elicit end-to-end chain-of-agents problem-solving abilities in LLMs, we introduce a multi-agent distillation framework to distill state-of-the-art multi-agent systems into chain-of-agents trajectories for agentic supervised fine-tuning. We then use agentic reinforcement learning on verifiable agentic tasks to further improve the models' capabilities on chain-of-agents problem solving. We call the resulting models Chain of Agents Models (CoAMs) . Our empirical studies demonstrate that CoAM establishes new state-of-the-art performance across diverse benchmarks in both search, math, and code settings.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 5461

Loading

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Weizhen Li, Jianbo Lin, Zhuosong Jiang, Jingyi Cao, Xinpeng Liu, Jiayu Zhang, Huang Zhenqiang, Qianben Chen, Weichen Sun, Qiexiang Wang, Hongxuan Lu, Tianrui Qin, ChenghaoZhu, Yi Yao, Shuying Fan, LiXiaowan, Tiannan Wang, Pai Liu, King Zhu, He Zhu et al. (10 additional authors not shown)