Decoupling Exploration and Exploitation in Reinforcement Learning

Lukas Schäfer; Filippos Christianos; Josiah Hanna; Stefano V Albrecht

Decoupling Exploration and Exploitation in Reinforcement Learning

Lukas Schäfer, Filippos Christianos, Josiah Hanna, Stefano V Albrecht

Published: 22 Jul 2021, Last Modified: 04 May 2025URL 2021 PosterReaders: Everyone

Keywords: Reinforcement Learning, Exploration, Intrinsic Rewards, Decoupling

TL;DR: We propose Decoupled RL which trains separate policies for exploration and exploitation using intrinsic rewards, aiming to address hyperparameter sensitivity and non-stationary challenges of intrinsically motivated exploration for RL

Abstract: Intrinsic rewards are commonly applied to improve exploration in reinforcement learning. However, these approaches suffer from instability caused by non-stationary reward shaping and strong dependency on hyperparameters. In this work, we propose Decoupled RL (DeRL) which trains separate policies for exploration and exploitation. DeRL can be applied with on-policy and off-policy RL algorithms. We evaluate DeRL algorithms in two sparse-reward environments with multiple types of intrinsic rewards. We show that DeRL is more robust to scaling and speed of decay of intrinsic rewards and converges to the same evaluation returns than intrinsically motivated baselines in fewer interactions.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/decoupling-exploration-and-exploitation-in/code)

1 Reply

Loading