Contextual Inverse Reinforcement Learning

Philip Korsunsky; Stav Belogolovsky; Tom Zahavy; Chen Tessler; Shie Mannor

Contextual Inverse Reinforcement Learning

Philip Korsunsky, Stav Belogolovsky, Tom Zahavy, Chen Tessler, Shie Mannor

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: Contextual MDP, Inverse Reinforcement Learning, Reinforcement Learning, Mirror Descent

TL;DR: We analyze contextual Markov decision processes in an inverse reinforcement learning setting. We propose and analyze several algorithms both theoretically and empirically.

Abstract: We consider the Inverse Reinforcement Learning problem in Contextual Markov Decision Processes. In this setting, the reward, which is unknown to the agent, is a function of a static parameter referred to as the context. There is also an “expert” who knows this mapping and acts according to the optimal policy for each context. The goal of the agent is to learn the expert’s mapping by observing demonstrations. We define an optimization problem for finding this mapping and show that when it is linear, the problem is convex. We present and analyze the sample complexity of three algorithms for solving this problem: the mirrored descent algorithm, evolution strategies, and the ellipsoid method. We also extend the first two methods to work with general reward functions, e.g., deep neural networks, but without the theoretical guarantees. Finally, we compare the different techniques empirically in driving simulation and a medical treatment regime.

Original Pdf: pdf

17 Replies

Loading