TempoRL: Temporal Priors for Exploration in Off-Policy Reinforcement Learning

Marco Bagatella; Sammy Joe Christen; Otmar Hilliges

TempoRL: Temporal Priors for Exploration in Off-Policy Reinforcement Learning

Marco Bagatella, Sammy Joe Christen, Otmar Hilliges

12 Oct 2021 (modified: 12 Oct 2025)Deep RL Workshop NeurIPS 2021Readers: Everyone

Keywords: reinforcement learning, exploration, prior

TL;DR: We introduce state-independent temporal priors to accelerate RL in unseen tasks.

Abstract: Effective exploration is a crucial challenge in deep reinforcement learning. Behavioral priors have been shown to tackle this problem successfully, at the expense of reduced generality and restricted transferability. We thus propose temporal priors as a non-Markovian generalization of behavioral priors for guiding exploration in reinforcement learning. Critically, we focus on state-independent temporal priors, which exploit the idea of temporal consistency and are generally applicable and capable of transferring across a wide range of tasks. We show how dynamically sampling actions from a probabilistic mixture of policy and temporal prior can accelerate off-policy reinforcement learning in unseen downstream tasks. We provide empirical evidence that our approach improves upon strong baselines in long-horizon continuous control tasks under sparse reward settings.

Supplementary Material: zip

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/temporl-temporal-priors-for-exploration-in/code)

0 Replies

Loading