An Empirical Study of Non-Uniform Sampling in Off-Policy Reinforcement Learning for Continuous Control

Nicholas Ioannidis; Jonathan Wilder Lavington; Mark Schmidt

An Empirical Study of Non-Uniform Sampling in Off-Policy Reinforcement Learning for Continuous Control

Nicholas Ioannidis, Jonathan Wilder Lavington, Mark Schmidt

12 Oct 2021 (modified: 05 May 2023)Deep RL Workshop NeurIPS 2021Readers: Everyone

Keywords: deep reinforcement learning, soft actor-critic, off-policy reinforcement learning, non-uniform sampling

TL;DR: Correct usage of non-uniform sampling in off-policy RL can improve performance and robustness to stochastic reward feedback and hyper-parameter sensitivity

Abstract: Off-policy reinforcement learning (RL) algorithms can take advantage of samples generated from all previous interactions with the environment through "experience replay". Such methods outperform almost all on-policy and model-based alternatives in complex tasks where a structured or well parameterized model of the world does not exist. This makes them desirable for practitioners who lack domain specific knowledge, but who still require high sample efficiency. However this high performance can come at a cost. Because of additional hyperparameters introduced to efficiently learn function approximators, off-policy RL can perform poorly on new problems. To address parameter sensitivity, we show how the correct choice of non-uniform sampling for experience replay can stabilize model performance under varying environmental conditions and hyper-parameters.

Supplementary Material: zip

0 Replies

Loading