Keywords: robot manipulation, imitation learning, object-centric representations
TL;DR: We introduce an imitation learning method for training policies that generalize beyond their demonstration settings.
Abstract: We introduce GROOT, an imitation learning method with object-centric and 3D priors to learn robust policies for vision-based manipulation. GROOT is able to train policies that generalize beyond their initial training conditions. GROOT first constructs object-centric 3D representations that are robust to background changes and camera views, and reason over the representations using a transformer-based policy. At test time, we introduce a segmentation correspondence model that allows policies to reuse their strategies in the case of new object manipulation. Through comprehensive experiments, we validate the robustness of GROOT policies against perceptual variations in both simulated and real-world environments. Their performance excels in all aspects of considered generalization in perception, outperforming traditional pixel-based or object proposal prior methods. We also extensively evaluate GROOT policies on real robots, where we demonstrate the efficacy under very wild changes in setup.
Student First Author: yes
Supplementary Material: zip
Instructions: I have read the instructions for authors (https://corl2023.org/instructions-for-authors/)