Online Inverse Reinforcement Learning with Learned Observation Model

Saurabh Arora; Prashant Doshi; Bikramjit Banerjee

Online Inverse Reinforcement Learning with Learned Observation Model

Saurabh Arora, Prashant Doshi, Bikramjit Banerjee

Published: 10 Sept 2022, Last Modified: 05 May 2023CoRL 2022 PosterReaders: Everyone

Keywords: Observation model, Inverse reinforcement learning, Maximum entropy

Abstract: With the motivation of extending incremental inverse reinforcement learning (I2RL) to real-world robotics applications with noisy observations as well as an unknown observation model, we introduce a new method (RIMEO) that approximates the observation model in order to best estimate the noise-free ground truth underlying the observations. It learns a maximum entropy distribution over the observation features governing the perception process, and then uses the inferred observation model to learn the reward function. Experimental evaluation is performed in two robotics tasks: (1) post-harvest vegetable sorting with a Sawyer arm based on human demonstration, and (2) breaching a perimeter patrol by two Turtlebots. Our experiments reveal that RIMEO learns a more accurate policy compared to (a) a state-of-the-art IRL method that does not directly learn an observation model, and (b) a custom baseline that learns a less sophisticated observation model. Furthermore, we show that RIMEO admits formal guarantees of monotonic convergence and a sample complexity bound.

Student First Author: yes

Supplementary Material: zip

9 Replies

Loading