Exploring Visual Pre-training for Robot Manipulation: Datasets, Models and Methods

Ya Jing; Xuelin Zhu; xingbin liu; Qie Sima; Tao zheng Yang; Yun hai Feng; Tao Kong

Exploring Visual Pre-training for Robot Manipulation: Datasets, Models and Methods

Ya Jing, Xuelin Zhu, xingbin liu, Qie Sima, Tao zheng Yang, Yun hai Feng, Tao Kong

Published: 17 Nov 2022, Last Modified: 20 Apr 2025PRL 2022 PosterReaders: Everyone

Keywords: Visual Pre-training, Robot Manipulation

Abstract: Visual pre-training with large-scale real-world data has made great progress in recent years, showing great potential in robot learning with pixel observations. However, the recipes of visual pre-training for robot manipulation tasks are yet to be built. In this paper, we first thoroughly investigate the effects of pre-training from three fundamental perspectives: datasets, model architectures and training methods. Several important observations are given that are beneficial for robot manipulation learning. Then, we propose a visual pre-training scheme for robot manipulation termed Vi-PRoM, which combines self-supervised learning and multi-task supervised learning. Concretely, the former employs contrastive learning to acquire underlying patterns from large-scale unlabeled data, while the latter allows learning visual semantics and temporal dynamics to facilitate robot manipulation tasks. Extensive experiments on robot manipulations in various simulation environments and the real robot demonstrate the superiority of the proposed scheme. We hope our study can motivate people in this topic.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/exploring-visual-pre-training-for-robot/code)

1 Reply

Loading