Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement

Benjamin Eysenbach; Xinyang Geng; Sergey Levine; Ruslan Salakhutdinov

Rewriting History with Inverse RL: Hindsight Inference for Policy Improvement

Benjamin Eysenbach, Xinyang Geng, Sergey Levine, Ruslan Salakhutdinov

12 Jun 2020 (modified: 06 Apr 2025)LifelongML@ICML2020Readers: Everyone

Student First Author: Yes

Keywords: multitask RL, inverse RL, hindsight relabeling

Abstract: Multi-task reinforcement learning (RL) aims to simultaneously learn policies for solving many tasks. Several prior works have found that relabeling past experience with different reward functions can improve sample efficiency. Relabeling methods typically pose the question: if, in hindsight, we assume that our experience was optimal for some task, for what task was it optimal? Inverse RL answers this question. In this paper we show that inverse RL is a principled mechanism for reusing experience across tasks. We use this idea to generalize goal-relabeling techniques from prior work to arbitrary types of reward functions. Our experiments confirm that relabeling data using inverse RL outperforms prior relabeling methods on goal-reaching tasks, and accelerates learning on more general multi-task settings where prior methods are not applicable, such as domains with discrete sets of rewards and those with linear reward functions.

TL;DR: Inverse RL is a principled way to figure out how to share experience across tasks.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/rewriting-history-with-inverse-rl-hindsight/code)

0 Replies

Loading