MasHost Builds It All: Autonomous Multi-Agent System Directed by Reinforcement Learning

Kuo Yang; Xingjie Yang; Linhui Yu; Qing Xu; Yan Fang; Xu Wang; Zhengyang Zhou; Yang Wang

MasHost Builds It All: Autonomous Multi-Agent System Directed by Reinforcement Learning

Kuo Yang, Xingjie Yang, Linhui Yu, Qing Xu, Yan Fang, Xu Wang, Zhengyang Zhou, Yang Wang

17 Sept 2025 (modified: 21 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Model, Multi-Agent System, Reinforcement Learning, Autonomous Ability.

TL;DR: We propose MasHost, a novel reinforcement learning-based framework that enables the fully autonomous construction of query-specific multi-agent systems.

Abstract: Large Language Model (LLM)-driven Multi-agent systems (Mas) have recently emerged as a powerful paradigm for tackling complex real-world tasks. However, existing Mas design strategies typically rely on manually crafted interaction mechanisms or heuristic rules, introducing human biases and constraining the autonomous ability. Even recent advances that claim to adaptively construct Mas still fall within the paradigm of semi-autonomous patterns. In this work, we introduce \texttt{MasHost}, a reinforcement learning (RL)-based framework designed for autonomous and query-adaptive Mas generation. Firstly, we formulate the generation of Mas as a graph search problem and propose Hierarchical Relative Policy Optimization (HRPO), a novel RL strategy that collaboratively combines group-level relative advantages with fine-grained action-wise rewards. Secondly, we design \texttt{MasHost} to jointly sample agent roles and their interactions through a unified probabilistic sampling mechanism, enabling adaptive and coherent Mas construction. Beyond the conventional emphasis on accuracy and efficiency, \texttt{MasHost} innovatively introduces component rationality, offering a new perspective on the principled design of multi-agent systems. To our knowledge, our proposed \texttt{MasHost} is the first RL-driven framework for autonomous Mas graph construction. Extensive experiments on six benchmarks demonstrate that \texttt{MasHost} consistently outperforms most competitive baselines, validating its effectiveness, efficiency, and structure rationality.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 8610

Loading