An Empirical Study of Non-Uniform Sampling in Off-Policy Reinforcement Learning for Continuous Control
Keywords: deep reinforcement learning, soft actor-critic, off-policy reinforcement learning, non-uniform sampling
TL;DR: Correct usage of non-uniform sampling in off-policy RL can improve performance and robustness to stochastic reward feedback and hyper-parameter sensitivity
Abstract: Off-policy reinforcement learning (RL) algorithms can take advantage of samples generated from all previous interactions with the environment through "experience replay". Such methods outperform almost all on-policy and model-based alternatives in complex tasks where a structured or well parameterized model of the world does not exist. This makes them desirable for practitioners who lack domain specific knowledge, but who still require high sample efficiency. However this high performance can come at a cost. Because of additional hyperparameters introduced to efficiently learn function approximators, off-policy RL can perform poorly on new problems. To address parameter sensitivity, we show how the correct choice of non-uniform sampling for experience replay can stabilize model performance under varying environmental conditions and hyper-parameters.
Supplementary Material: zip
0 Replies
Loading