Data-efficient Deep Reinforcement Learning for Dexterous Manipulation

Ivo Popov, Nicolas Heess, Timothy P. Lillicrap, Roland Hafner, Gabriel Barth-Maron, Matej Vecerik, Thomas Lampe, Tom Erez, Yuval Tassa, Martin Riedmiller

Feb 15, 2018 (modified: Feb 15, 2018) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: Grasping an object and precisely stacking it on another is a difficult task for traditional robotic control or hand-engineered approaches. Here we examine the problem in simulation and provide techniques aimed at solving it via deep reinforcement learning. We introduce two straightforward extensions to the Deep Deterministic Policy Gradient algorithm (DDPG), which make it significantly more data-efficient and scalable. Our results show that by making extensive use of off-policy data and replay, it is possible to find high-performance control policies. Further, our results hint that it may soon be feasible to train successful stacking policies by collecting interactions on real robots.
  • TL;DR: Data-efficient deep reinforcement learning can be used to learning precise stacking policies.
  • Keywords: Reinforcement learning, robotics, dexterous manipulation, off-policy learning