Keywords: Manipulation policy, Imitation learning, 3D reconstruction, Diffusion policies
TL;DR: We equip a 3D Diffusion Policy with a metric-semantic reconstruction in order to solve tasks that require spatial memory.
Abstract: End-to-end learning of robot control policies, structured as neural networks, has emerged as a promising approach to robotic manipulation. To complete many common tasks, relevant objects are required to pass in and out of a robot’s field of view. In these settings, spatial memory - the ability to remember the spatial composition of the scene - is an important competency. However, building such mechanisms into robot learning systems remains an open research problem. We introduce mindmap (Spatial Memory in Deep Feature Maps for 3D Action Policies), a 3D diffusion policy that generates robot trajectories based on a semantic 3D reconstruction of the environment. We show in simulation experiments that our approach is effective at solving tasks where state-of-the-art approaches
without memory mechanisms struggle. We will release our reconstruction system, training code, and evaluation tasks to spur research in this direction.
Supplementary Material: zip
Submission Number: 7
Loading