- Keywords: Contextual MDP, Inverse Reinforcement Learning, Reinforcement Learning, Mirror Descent
- TL;DR: We analyze contextual Markov decision processes in an inverse reinforcement learning setting. We propose and analyze several algorithms both theoretically and empirically.
- Abstract: We consider the Inverse Reinforcement Learning problem in Contextual Markov Decision Processes. In this setting, the reward, which is unknown to the agent, is a function of a static parameter referred to as the context. There is also an “expert” who knows this mapping and acts according to the optimal policy for each context. The goal of the agent is to learn the expert’s mapping by observing demonstrations. We define an optimization problem for finding this mapping and show that when it is linear, the problem is convex. We present and analyze the sample complexity of three algorithms for solving this problem: the mirrored descent algorithm, evolution strategies, and the ellipsoid method. We also extend the first two methods to work with general reward functions, e.g., deep neural networks, but without the theoretical guarantees. Finally, we compare the different techniques empirically in driving simulation and a medical treatment regime.