Scalable Multi-Agent Autonomous Learning in Complex Unpredictable Environments

Scalable Multi-Agent Autonomous Learning in Complex Unpredictable Environments

ICLR 2026 Conference Submission22295 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-Agent Reinforcement Learning, MARL, Population-Based Training, Policy Bank, Shared Experience Learning, Self-Learning Intelligent Agents, Trajectory Merging, Centralized Training and Decentralized Execution (CTDE), Task Decomposition, Task Distribution

TL;DR: Many agents learn unpredictable tasks with a 2-phase iterative MARL approach a) optimally decompose, distribute activities b) execute with best policy from a shared policy bank, and refine it with shared experience-learning for continuous evolution.

Abstract: This research introduces a novel multi-agent self-learning solution for large and complex tasks in dynamic and unpredictable environments where large groups of homogeneous agents coordinate to achieve collective goals. Using a novel iterative two-phase multi-agent reinforcement learning approach, agents continuously learn and evolve in performing the task. In phase one, agents collaboratively determine an effective global task distribution based on the current state of the task and assign the most suitable agent to each activity. In phase two, the selected agent refines activity execution using a shared policy from a policy bank, built from collective past experiences. Merging agent trajectories across similar agents using a novel shared experience learning mechanism enables continuous adaptation, while iterating through these two phases significantly reduces coordination overhead. This novel approach was tested with an exemplary test system comprising drones, with results including real-world scenarios in domains like forest firefighting. This approach performed well by evolving autonomously in new environments with a large number of agents. In adapting quickly to new and changing environments, this versatile approach provides a highly scalable foundation for many other applications tackling dynamic and hard-to-optimize domains that are not possible today.

Primary Area: reinforcement learning

Submission Number: 22295

Loading