Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients

Chen Tessler; Nadav Merlis; Shie Mannor

Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients

Chen Tessler, Nadav Merlis, Shie Mannor

25 Sept 2019 (modified: 05 May 2023)ICLR 2020 Conference Blind SubmissionReaders: Everyone

Keywords: Deep Reinforcement Learning, Variance Reduction, Policy Gradient

TL;DR: We propose a conservative update rule for off-policy policy-gradient methods (e.g., DDPG) in order to reduce the variance of the training regime.

Abstract: In recent years, advances in deep learning have enabled the application of reinforcement learning algorithms in complex domains. However, they lack the theoretical guarantees which are present in the tabular setting and suffer from many stability and reproducibility problems \citep{henderson2018deep}. In this work, we suggest a simple approach for improving stability and providing probabilistic performance guarantees in off-policy actor-critic deep reinforcement learning regimes. Experiments on continuous action spaces, in the MuJoCo control suite, show that our proposed method reduces the variance of the process and improves the overall performance.

Code: https://anonymous.4open.science/r/1754f3b9-d618-4298-804c-c8e66d787fb7/

Original Pdf: pdf

7 Replies

Loading