- TL;DR: This paper extends the multi-agent generative adversarial imitation learning to extensive-form Markov games.
- Abstract: Imitation learning aims to inversely learn a policy from expert demonstrations, which has been extensively studied in the literature for both single-agent setting with Markov decision process (MDP) model, and multi-agent setting with Markov game (MG) model. However, existing approaches for general multi-agent Markov games are not applicable to multi-agent extensive Markov games, where agents make asynchronous decisions following a certain order, rather than simultaneous decisions. We propose a novel framework for asynchronous multi-agent generative adversarial imitation learning (AMAGAIL) under general extensive Markov game settings, and the learned expert policies are proven to guarantee subgame perfect equilibrium (SPE), a more general and stronger equilibrium than Nash equilibrium (NE). The experiment results demonstrate that compared to state-of-the-art baselines, our AMAGAIL model can better infer the policy of each expert agent using their demonstration data collected from asynchronous decision-making scenarios (i.e., extensive Markov games).
- Code: https://www.dropbox.com/sh/ngklsadt974d2pw/AAAVArZfh7W2GBjiNv2YPhcja?dl=0
- Keywords: Multi-agent, Imitation Learning, Inverse Reinforcement Learning