- Abstract: Grasping an object and precisely stacking it on another is a difficult task for traditional robotic control or hand-engineered approaches. Here we examine the problem in simulation and provide techniques aimed at solving it via deep reinforcement learning. We introduce two straightforward extensions to the Deep Deterministic Policy Gradient algorithm (DDPG), which make it significantly more data-efficient and scalable. Our results show that by making extensive use of off-policy data and replay, it is possible to find high-performance control policies. Further, our results hint that it may soon be feasible to train successful stacking policies by collecting interactions on real robots.
- TL;DR: Data-efficient deep reinforcement learning can be used to learning precise stacking policies.
- Keywords: Reinforcement learning, robotics, dexterous manipulation, off-policy learning