Maximum Entropy Population Based Training for Zero-Shot Human-AI CoordinationDownload PDF

Sep 29, 2021 (edited Nov 12, 2021)ICLR 2022 SubmittedReaders: Everyone
  • Keywords: Human-AI Coordination, Reinforcement Learning, Zero-Shot Human-AI Coordination, Deep Reinforcement Learning
  • Abstract: An AI agent should be able to coordinate with humans to solve tasks. We consider the problem of training a Reinforcement Learning (RL) agent without using any human data, i.e., in a zero-shot setting, to make it capable of collaborating with humans. Standard RL agents learn through self-play. Unfortunately, these agents only know how to collaborate with themselves and normally do not perform well with unseen partners, such as humans. The methodology of how to train a robust agent in a zero-shot fashion is still subject to research. Motivated from the maximum entropy RL, we derive a centralized population entropy objective to facilitate learning of a diverse population of agents, which is later used to train a robust AI agent to collaborate with unseen partners. The proposed method shows its effectiveness compared to baseline methods, including self-play PPO, the standard Population-Based Training (PBT), and trajectory diversity-based PBT, in the popular Overcooked game environment. We also conduct online experiments with real humans and further demonstrate the efficacy of the method in the real world.
  • One-sentence Summary: This paper introduces Maximum Entropy Population-based training (MEP), a deep reinforcement learning method for robust human-AI coordination.
  • Supplementary Material: zip
17 Replies