Recall Traces: Backtracking Models for Efficient Reinforcement Learning

Anirudh Goyal; Philemon Brakel; William Fedus; Soumye Singhal; Timothy Lillicrap; Sergey Levine; Hugo Larochelle; Yoshua Bengio

Recall Traces: Backtracking Models for Efficient Reinforcement Learning

Anirudh Goyal, Philemon Brakel, William Fedus, Soumye Singhal, Timothy Lillicrap, Sergey Levine, Hugo Larochelle, Yoshua Bengio

Published: 21 Dec 2018, Last Modified: 05 May 2023ICLR 2019 Conference Blind SubmissionReaders: Everyone

Abstract: In many environments only a tiny subset of all states yield high reward. In these cases, few of the interactions with the environment provide a relevant learning signal. Hence, we may want to preferentially train on those high-reward states and the probable trajectories leading to them. To this end, we advocate for the use of a \textit{backtracking model} that predicts the preceding states that terminate at a given high-reward state. We can train a model which, starting from a high value state (or one that is estimated to have high value), predicts and samples which (state, action)-tuples may have led to that high value state. These traces of (state, action) pairs, which we refer to as Recall Traces, sampled from this backtracking model starting from a high value state, are informative as they terminate in good states, and hence we can use these traces to improve a policy. We provide a variational interpretation for this idea and a practical algorithm in which the backtracking model samples from an approximate posterior distribution over trajectories which lead to large rewards. Our method improves the sample efficiency of both on- and off-policy RL algorithms across several environments and tasks.

Keywords: Model free RL, Variational Inference

TL;DR: A backward model of previous (state, action) given the next state, i.e. P(s_t, a_t | s_{t+1}), can be used to simulate additional trajectories terminating at states of interest! Improves RL learning efficiency.

14 Replies

Loading