Keywords: multi-agent reinforcement learning, zero-shot coordination, conditional policy similarity
Abstract: Multi-Agent Reinforcement Learning (MARL) in cooperative tasks usually follows the self-play setting, where agents are trained by playing with a fixed group of agents. However, in the face of Zero-Shot Coordination (ZSC), where an agent must coordinate with unseen partners, self-play agents may fail. ZSC performance is traditionally measured by cross-play, where individually trained agents are required to play with each other. However, cross-play score varies a lot for different combinations of agents, making it not reliable enough to only use a model's averaged cross-play score with several models to evaluate its ZSC performance. We think the reason for this phenomenon may be that cross-play score is highly related to the similarity between an agent's training partner and ZSC partner, and this similarity varies widely. Therefore, we define the Conditional Policy Similarity between an agent's Training partner and Testing partner (CPSTT) and conduct abundant experiments to confirm a strong linear correlation between CPSTT and cross-play score. Based on it, we propose a new criterion to evaluate ZSC performance: a model is considered better if it has higher cross-play score compared to another model given the same CPSTT. Furthermore, we put forward a Similarity-Based Robust Training (SBRT) scheme that improves agents' ZSC performance by disturbing their partners' actions during training according to a pre-defined CPSTT value. We apply SBRT to four MARL frameworks and their ZSC performance is improved whether measured by the traditional criterion or ours.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)
13 Replies
Loading