Abstract: In this work, we study an inverse reinforcement learning (IRL) problem where the experts are planning \textit{under a shared reward function but with different, unknown planning horizons}. Without the knowledge of discount factors, the reward function has a larger feasible solution set, which makes it harder to identify a reward function. To overcome this challenge, we develop an algorithm that in practice, can learn a reward function similar to the true reward function. We give an empirical characterization of the identifiability and generalizability of the feasible set of the reward function.
Submission Number: 53
Loading