Keywords: Contextual MDP, Inverse Reinforcement Learning, Reinforcement Learning, Mirror Descent
TL;DR: We analyze contextual Markov decision processes in an inverse reinforcement learning setting. We propose and analyze several algorithms both theoretically and empirically.
Abstract: We consider the Inverse Reinforcement Learning problem in Contextual Markov
Decision Processes. In this setting, the reward, which is unknown to the agent, is a
function of a static parameter referred to as the context. There is also an “expert”
who knows this mapping and acts according to the optimal policy for each context.
The goal of the agent is to learn the expert’s mapping by observing demonstrations.
We define an optimization problem for finding this mapping and show that when
it is linear, the problem is convex. We present and analyze the sample complexity
of three algorithms for solving this problem: the mirrored descent algorithm,
evolution strategies, and the ellipsoid method. We also extend the first two methods
to work with general reward functions, e.g., deep neural networks, but without the
theoretical guarantees. Finally, we compare the different techniques empirically in
driving simulation and a medical treatment regime.
Original Pdf: pdf
17 Replies
Loading