Bi-Level Dynamic Parameter Sharing among Individuals and Teams for Promoting Collaborations in Multi-Agent Reinforcement Learning

Yan Liu; Ying Tiffany He; Fei Yu; Zhong Ming

Bi-Level Dynamic Parameter Sharing among Individuals and Teams for Promoting Collaborations in Multi-Agent Reinforcement Learning

Yan Liu, Ying Tiffany He, Fei Yu, Zhong Ming

Published: 01 Feb 2023, Last Modified: 16 May 2025Submitted to ICLR 2023Readers: Everyone

Keywords: Multi-Agent Reinforcement Learning, Reinforcement Learning

TL;DR: We propose a bi-level dynamic parameter sharing mechanism between individuals and teams, which can not only promote agents to learn diversified strategies, but also promote agents to form more stable and complementary cooperative relationships.

Abstract: Parameter sharing has greatly contributed to the success of multi-agent reinforcement learning in recent years. However, most existing parameter sharing mechanisms are static, and parameters are indiscriminately shared among individuals, ignoring the dynamic environments and different roles of multiple agents. In addition, although a single-level selective parameter sharing mechanism can promote the diversity of strategies, it is hard to establish complementary and cooperative relationships between agents. To address these issues, we propose a bi-level dynamic parameter sharing mechanism among individuals and teams for promoting effective collaborations (BDPS). Specifically, at the individual level, we define virtual dynamic roles based on the long-term cumulative advantages of agents and share parameters among agents in the same role. At the team level, we combine agents of different virtual roles and share parameters of agents in the same group. Through the joint efforts of these two levels, we achieve a dynamic balance between the individuality and commonality of agents, enabling agents to learn more complex and complementary collaborative relationships. We evaluate BDPS on a challenging set of StarCraft II micromanagement tasks. The experimental results show that our method outperforms the current state-of-the-art baselines, and we demonstrate the reliability of our proposed structure through ablation experiments.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)

5 Replies

Loading