MANSA: LEARNING FAST AND SLOW IN MULTI-AGENT SYSTEMS WITH A GLOBAL SWITCHING AGENT

David Henry Mguni; Taher Jafferjee; Haojun Chen; Jianhong Wang; Feifei Tong; Stephen Marcus McAleer; Jun Wang; Yaodong Yang

MANSA: LEARNING FAST AND SLOW IN MULTI-AGENT SYSTEMS WITH A GLOBAL SWITCHING AGENT

David Henry Mguni, Taher Jafferjee, Haojun Chen, Jianhong Wang, Feifei Tong, Stephen Marcus McAleer, Jun Wang, Yaodong Yang

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Multi-agent Systems, Multi-agent Reinforcement Learning

TL;DR: Decentralized learning, while computationally and communicatively cheap, suffers from well-known convergence issues in MARL. We develop a method to minimally use centralized learning to alleviate this problem.

Abstract: In multi-agent systems, independent learners (IL) often show remarkable performance and easily scale with the number of agents. Yet, training IL can sometimes be inefficient particularly in states that require coordinated exploration. Using observations of other agents’ actions through centralised learning (CL) enables agents to quickly learn how to coordinate their behaviour but employing CL at all states is prohibitively expensive in many real-world applications. Besides, applying CL often needs strong representational constraints (such as individual-global-max condition) that can lead to poor performance if violated. In this paper, we introduce a novel IL framework named MANSA that selectively employs CL only at states that require coordination. Central to MANSA is the additional reinforcement learning (RL) agent that uses switching controls to quickly learn when and where to activate CL so as to boost the performance of IL while using only IL everywhere else. Our theory proves that MANSA’s switching control mechanism, which can seamlessly adopt any existing multi-agent RL (MARL) algorithms, preserves MARL convergence properties in cooperative settings. Importantly, we prove that MANSA can improve performance and maximise performance given a limited budget of CL calls. We show empirically in Level-based Foraging and SMAC settings that MANSA achieves fast, superior training performance through its minimal selective use of CL.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)

1 Reply

Loading