Learning a Behavioral Repertoire from DemonstrationsDownload PDF

25 Sep 2019 (modified: 24 Dec 2019)ICLR 2020 Conference Blind SubmissionReaders: Everyone
  • Original Pdf: pdf
  • Keywords: Behavioral Repertoires, Imitation Learning, Deep Learning, Adaptation, StarCraft 2
  • TL;DR: BRIL allows a single neural network to learn a repertoire of behaviors from a set of demonstrations that can be precisely modulated.
  • Abstract: Imitation Learning (IL) is a machine learning approach to learn a policy from a set of demonstrations. IL can be useful to kick-start learning before applying reinforcement learning (RL) but it can also be useful on its own, e.g. to learn to imitate human players in video games. However, a major limitation of current IL approaches is that they learn only a single ``"average" policy based on a dataset that possibly contains demonstrations of numerous different types of behaviors. In this paper, we present a new approach called Behavioral Repertoire Imitation Learning (BRIL) that instead learns a repertoire of behaviors from a set of demonstrations by augmenting the state-action pairs with behavioral descriptions. The outcome of this approach is a single neural network policy conditioned on a behavior description that can be precisely modulated. We apply this approach to train a policy on 7,777 human demonstrations for the build-order planning task in StarCraft II. Dimensionality reduction techniques are applied to construct a low-dimensional behavioral space from the high-dimensional army unit composition of each demonstration. The results demonstrate that the learned policy can be effectively manipulated to express distinct behaviors. Additionally, by applying the UCB1 algorithm, the policy can adapt its behavior -in-between games- to reach a performance beyond that of the traditional IL baseline approach.
4 Replies