Keywords: inverse reinforcement learning, active learning, imitation learning, Bayesian methods
TL;DR: Information-theoretic method for active Bayesian inverse reinforcement learning
Abstract: As AI systems become increasingly autonomous, aligning their decision-making to human preferences is essential. In domains like autonomous driving or robotics, it is impossible to write down the reward function representing these preferences by hand. Inverse reinforcement learning (IRL) offers a promising approach to infer the unknown reward from demonstrations. However, obtaining human demonstrations can be costly. Active IRL addresses this challenge by strategically selecting the most informative scenarios for human demonstration, reducing the amount of required human effort. As a principled alternative to prior heuristic approaches, we introduce two information-theoretic methods for Active IRL that aim to maximise information about the reward, or alternatively about the regret, at every step, directly targeting either the reward learning or the apprenticeship learning objective. We prove that our method yields a probably-approximately-correct (PAC) policy -- the first such guarantee for this task. We also illustrate failure modes of prior methods and provide an experimental comparison.
Submission Number: 286
Loading