Abstract: We develop a new model-free reinforcement learning (RL) algorithm: D2AC. Motivated by recent advances in iterative function approximation, we make two adjustments to the typical actor-critic RL pipeline. First, we learn distributional critics with a novel fusion of distributional RL and clipped double Q-learning. Second, we use a diffusion model to parameterize the policy and derive an efficient method for aligning the diffusion process with policy improvement. These changes are highly effective, resulting in highly performant model-free policies on a benchmark of eighteen hard RL tasks, including Humanoid, Dog, and Shadow Hand domains, spanning both dense-reward and goal-conditioned RL scenarios. Beyond standard benchmarks, we also evaluate a biologically motivated predator-prey task to examine the behavioral robustness and generalization capacity of our approach.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Manuel_Haussmann1
Submission Number: 5221
Loading