Student First Author: yes
Keywords: Observation model, Inverse reinforcement learning, Maximum entropy
Abstract: With the motivation of extending incremental inverse reinforcement learning (I2RL) to real-world robotics applications with noisy observations as well as an unknown observation model, we introduce a new method (RIMEO) that approximates the observation model in order to best estimate the noise-free ground truth underlying the observations. It learns a maximum entropy distribution over the observation features governing the perception process, and then uses the inferred observation model to learn the reward function. Experimental evaluation is performed in two robotics tasks: (1) post-harvest vegetable sorting with a Sawyer arm based on human demonstration, and (2) breaching a perimeter patrol by two Turtlebots. Our experiments reveal that RIMEO learns a more accurate policy compared to (a) a state-of-the-art IRL method that does not directly learn an observation model, and (b) a custom baseline that learns a less sophisticated observation model. Furthermore, we show that RIMEO admits formal guarantees of monotonic convergence and a sample complexity bound.
Supplementary Material: zip