Walking the Values in Bayesian Inverse Reinforcement Learning

Ondrej Bajgar; Alessandro Abate; Konstantinos Gatsis; Michael A Osborne

Walking the Values in Bayesian Inverse Reinforcement Learning

Ondrej Bajgar, Alessandro Abate, Konstantinos Gatsis, Michael A Osborne

Published: 26 Apr 2024, Last Modified: 15 Jul 2024UAI 2024 spotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Bayesian methods, inverse reinforcement learning, imitation learning

TL;DR: An MCMC-based Bayesian inverse reinforcement learning method for continuous environments

Abstract: The goal of Bayesian inverse reinforcement learning (IRL) is recovering a posterior distribution over reward functions using a set of demonstrations from an expert optimizing for a reward unknown to the learner. The resulting posterior over rewards can then be used to synthesize an apprentice policy that performs well on the same or a similar task. A key challenge in Bayesian IRL is bridging the computational gap between the hypothesis space of possible rewards and the likelihood, often defined in terms of Q values: vanilla Bayesian IRL needs to solve the costly forward planning problem -- going from rewards to the Q values -- at every step of the algorithm, which may need to be done thousands of times. We propose to solve this by a simple change: instead of focusing on primarily sampling in the space of rewards, we can focus on primarily working in the space of Q-values, since the computation required to go from Q-values to reward is radically cheaper. Furthermore, this reversion of the computation makes it easy to compute the gradient allowing efficient sampling using Hamiltonian Monte Carlo. We propose ValueWalk -- a new Markov chain Monte Carlo method based on this insight -- and illustrate its advantages on several tasks.

List Of Authors: Bajgar, Ondrej and Abate, Alessandro and Gatsis, Konstantinos and Osborne, Michael

Latex Source Code: zip

Signed License Agreement: pdf

Submission Number: 696

Loading