Human-Level Performance in No-Press Diplomacy via Equilibrium SearchDownload PDF

28 Sep 2020 (modified: 25 Jan 2021)ICLR 2021 OralReaders: Everyone
  • Keywords: multi-agent systems, regret minimization, no-regret learning, game theory, reinforcement learning
  • Abstract: Prior AI breakthroughs in complex games have focused on either the purely adversarial or purely cooperative settings. In contrast, Diplomacy is a game of shifting alliances that involves both cooperation and competition. For this reason, Diplomacy has proven to be a formidable research challenge. In this paper we describe an agent for the no-press variant of Diplomacy that combines supervised learning on human data with one-step lookahead search via external regret minimization. External regret minimization techniques have been behind previous AI successes in adversarial games, most notably poker, but have not previously been shown to be successful in large-scale games involving cooperation. We show that our agent greatly exceeds the performance of past no-press Diplomacy bots, is unexploitable by expert humans, and achieves a rank of 23 out of 1,128 human players when playing anonymous games on a popular Diplomacy website.
  • One-sentence Summary: We present an agent that approximates a one-step equilibrium in no-press Diplomacy using no-regret learning and show that it exceeds human-level performance
  • Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
  • Supplementary Material: zip
12 Replies