Robust Imitation via Mirror Descent Inverse Reinforcement LearningDownload PDF


Sep 29, 2021 (edited Oct 06, 2021)ICLR 2022 Conference Blind SubmissionReaders: Everyone
  • Keywords: inverse reinforcement learning, reward learning, regularized markov decision processes, imitation learning
  • Abstract: Adversarial imitation learning techniques are based on modeling statistical divergences using agent and expert demonstration data. However, unbiased minimization of these divergences is not usually guaranteed due to the geometry of the underlying space. Furthermore, when the size of demonstrations is not sufficient, estimated reward functions from the discriminative signals become uncertain and fail to give informative feedback. Instead of formulating a global cost at once, we consider reward functions as an iterative sequence in a proximal method. In this paper, we show that rewards dervied by mirror descent ensures minimization of a Bregman divergence in terms of a rigorous regret bound of $\mathcal{O}(1/T)$ for a particular condition of step sizes $\{\eta_t\}_{t=1}^T$. The resulting mirror descent adversarial inverse reinforcement learning (MD-AIRL) algorithm gradually advances a parameterized reward function in an associated reward space, and the sequence of such functions provides optimization targets for the policy space. We empirically validate our method in discrete and continuous benchmarks and show that MD-AIRL outperforms previous methods in various settings.
  • One-sentence Summary: we present a new online IRL algorithm that provides iterative proximal optimization targets.
0 Replies