VIOLA: Object-Centric Imitation Learning for Vision-Based Robot Manipulation

Yifeng Zhu; Abhishek Joshi; Peter Stone; Yuke Zhu

VIOLA: Object-Centric Imitation Learning for Vision-Based Robot Manipulation

Yifeng Zhu, Abhishek Joshi, Peter Stone, Yuke Zhu

Published: 10 Sept 2022, Last Modified: 05 May 2023CoRL 2022 PosterReaders: Everyone

Keywords: Imitation Learning, Robot Manipulation, Object-Centric Representation

TL;DR: We introduce an object-centric imitation learning approach for robot manipulation that acquire robust, closed-loop visuomotor poicy.

Abstract: We introduce VIOLA, an object-centric imitation learning approach to learning closed-loop visuomotor policies for robot manipulation. Our approach constructs object-centric representations based on general object proposals from a pre-trained vision model. VIOLA uses a transformer-based policy to reason over these representations and attend to the task-relevant visual factors for action prediction. Such object-based structural priors improve deep imitation learning algorithm's robustness against object variations and environmental perturbations. We quantitatively evaluate VIOLA in simulation and on real robots. VIOLA outperforms the state-of-the-art imitation learning methods by $45.8%$ in success rate. It has also been deployed successfully on a physical robot to solve challenging long-horizon tasks, such as dining table arrangement and coffee making. More videos and model details can be found in supplementary material and the project website: https://ut-austin-rpl.github.io/VIOLA/.

Student First Author: yes

Supplementary Material: zip

Website: https://ut-austin-rpl.github.io/VIOLA/

Code: https://github.com/UT-Austin-RPL/VIOLA

20 Replies

Loading