- Keywords: Robot Control, Policy Learning, Differentiable Simulation, Reinforcement Learning
- Abstract: Deep reinforcement learning can generate complex control policies, but requires large amounts of training data to work effectively. Recent work has attempted to address this issue by leveraging differentiable simulators. However, inherent problems such as local minima and exploding/vanishing numerical gradients prevent these methods from being generally applied to control tasks with complex contact-rich dynamics, such as humanoid locomotion in classical RL benchmarks. In this work, we present SHAC, a short-horizon actor-critic method that successfully leverages parallel differentiable simulation to accelerate policy learning. Our method alleviates problems with local minima through a smooth critic function, avoids vanishing/exploding gradients through a truncated learning window, and allows many physical environments to be run in parallel. We evaluate our method on classical RL control tasks, and show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms. In addition, we demonstrate the scalability of our method by applying it to the challenging high-dimensional problem of muscle-actuated locomotion with a large action space, achieving a greater than 17x reduction in training time over the best-performing established RL algorithm. More visual results are provided at: https://sites.google.com/view/shac
- One-sentence Summary: We propose an efficient policy learning method leveraging the recent advance of differentiable simulation, and our method outperforms state-of-the-art algorithms in both sample efficiency and wall clock time on multiple challenging control tasks.
- Supplementary Material: zip