Enhancing Multi-Agent Learning in Real-World Interactive Environments through Process Reward Decomposition

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Language model, Muti-agent learning
Abstract: LLM-based agents have made significant advancements in interactive environments, such as mobile operations and web browsing, with multi-agent systems further boosting performance. However, current agent learning techniques heavily rely on in-domain data and struggle to generalize across tasks and environments. Moreover, existing multi-agent learning methods are limited by fixed role assignments, which restrict their flexibility and generalization. Furthermore, the multi-step nature of interactive tasks, combined with sparse end-to-end reward signals, hinder effective learning to a great extent. To address these issues, we propose $\textit{CollabUIAgents}$, a two-stage multi-agent learning framework for interactive environments. In the first stage, the base model is adapted to the environment using curriculum learning on multi-level instruction data. In the second stage, a novel process reward decomposition strategy is introduced during reinforcement learning, allowing rewards to be distributed at both the agent and conversation round levels. This granular feedback fosters collaborative awareness among agents without predefined roles and improves learning efficacy. Experimental results show that our method significantly enhances the performance of multi-agent systems based on open-source models, achieving notable improvements both within and across domains, while also exhibiting strong cross-environment generalization capabilities. Moreover, our best-performing systems achieve results on par with or exceed those of the strong closed-source models, while maintaining the flexibility to be integrated with prompt-based multi-agent systems for future research.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7928
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview