Towards Simplicity in Deep Reinforcement Learning: Streamlined Off-Policy Learning

Che Wang; Yanqiu Wu; Quan Vuong; Keith Ross

Towards Simplicity in Deep Reinforcement Learning: Streamlined Off-Policy Learning

Che Wang, Yanqiu Wu, Quan Vuong, Keith Ross

25 Sept 2019 (modified: 22 Jun 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: Deep Reinforcement Learning, Sample Efficiency, Off-Policy Algorithms

TL;DR: We propose a new DRL off-policy algorithm achieving state-of-the-art performance.

Abstract: The field of Deep Reinforcement Learning (DRL) has recently seen a surge in the popularity of maximum entropy reinforcement learning algorithms. Their popularity stems from the intuitive interpretation of the maximum entropy objective and their superior sample efficiency on standard benchmarks. In this paper, we seek to understand the primary contribution of the entropy term to the performance of maximum entropy algorithms. For the Mujoco benchmark, we demonstrate that the entropy term in Soft Actor Critic (SAC) principally addresses the bounded nature of the action spaces. With this insight, we propose a simple normalization scheme which allows a streamlined algorithm without entropy maximization match the performance of SAC. Our experimental results demonstrate a need to revisit the benefits of entropy regularization in DRL. We also propose a simple non-uniform sampling method for selecting transitions from the replay buffer during training. We further show that the streamlined algorithm with the simple non-uniform sampling scheme outperforms SAC and achieves state-of-the-art performance on challenging continuous control tasks.

Code: https://anonymous.4open.science/r/e484a8c7-268a-4a66-a001-1e7676540237/

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 3 code implementations](https://www.catalyzex.com/paper/towards-simplicity-in-deep-reinforcement/code)

Original Pdf: pdf

15 Replies

Loading