Distributed Distributional Deterministic Policy Gradients

Gabriel Barth-Maron; Matthew W. Hoffman; David Budden; Will Dabney; Dan Horgan; Dhruva TB; Alistair Muldal; Nicolas Heess; Timothy Lillicrap

Distributed Distributional Deterministic Policy Gradients

Gabriel Barth-Maron, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan, Dhruva TB, Alistair Muldal, Nicolas Heess, Timothy Lillicrap

15 Feb 2018 (modified: 22 Jun 2025)ICLR 2018 Conference Blind SubmissionReaders: Everyone

Abstract: This work adopts the very successful distributional perspective on reinforcement learning and adapts it to the continuous control setting. We combine this within a distributed framework for off-policy learning in order to develop what we call the Distributed Distributional Deep Deterministic Policy Gradient algorithm, D4PG. We also combine this technique with a number of additional, simple improvements such as the use of N-step returns and prioritized experience replay. Experimentally we examine the contribution of each of these individual components, and show how they interact, as well as their combined contributions. Our results show that across a wide variety of simple control tasks, difficult manipulation tasks, and a set of hard obstacle-based locomotion tasks the D4PG algorithm achieves state of the art performance.

TL;DR: We develop an agent that we call the Distributional Deterministic Deep Policy Gradient algorithm, which achieves state of the art performance on a number of challenging continuous control problems.

Keywords: policy gradient, continuous control, actor critic, reinforcement learning

Code: [![Papers with Code](/images/pwc_icon.svg) 4 community implementations](https://paperswithcode.com/paper/?openreview=SyZipzbCb)

Data: [MuJoCo](https://paperswithcode.com/dataset/mujoco)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/distributed-distributional-deterministic/code)

12 Replies

Loading