Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning

Peihong Yu; Amrit Bedi; Alec Koppel; Carl Busart; Priya Narayan; Dinesh Manocha; Pratap Tokekar

Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning

Peihong Yu, Amrit Bedi, Alec Koppel, Carl Busart, Priya Narayan, Dinesh Manocha, Pratap Tokekar

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: Multi-agent Reinforcement Learning, Learning from Demonstration, Actor-Critic

Abstract: Multi-Agent Reinforcement Learning (MARL) algorithms face the challenge of efficient exploration due to the exponential increase in the size of the joint state-action space. While demonstration-guided learning has proven beneficial in single-agent settings, its direct applicability to MARL is hindered by the practical difficulty of obtaining joint expert demonstrations. In this work, we introduce a novel concept of personalized expert demonstrations that an agent-specific expert provides. These demonstrations are tailored for an individual agent or, more broadly, for an individual type of agent in a heterogeneous team. It is crucial to emphasize that these demonstrations solely pertain to single-agent behaviors and do not encompass any cooperative elements. Consequently, it is essential to note that these demonstrations may not be inherently optimal when employed within a cooperative setting. To bootstrap the learning from the personalized expert demonstrations, we reformulate the MARL problem in occupancy measure space and propose two innovative algorithms, namely expert-guided MARL (EG-MARL) and Generalized EG-MARL (GEG-MARL). These algorithms involve the acquisition of personalized reward signals through demonstrations to guide agent exploration and the fostering of collaborative behaviors through environmental reward feedbacks. Our proposed algorithms are evaluated across both discrete and continuous environments. The results underscore the capacity of our methods to learn near-optimal policies even when provided with suboptimal demonstrations, and they excel in solving coordinated tasks that challenge state-of-the-art MARL algorithms.

Supplementary Material: zip

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 6331

Loading