Zero-Shot Object Manipulation with Semantic 3D Image Augmentation for Perceiver-ActorDownload PDF

15 Apr 2023 (modified: 08 May 2023)Submitted to ICRA-23 Workshop on Pretraining4RoboticsReaders: Everyone
Keywords: Pretrained Models, Robot Learning, Imitation Learning, Vision Language Model, Data Augmentation
TL;DR: In robot imitation learning for robot manipulation setting , used large scale pretrained vision-language mode to augmented robot demonstration data using inpaining and depth estimation model to create new data for model training.
Abstract: Recent advances in robot learning have shown promise in achieving multitask control and generalisation to novel scenarios—a feat previously difficult to achieve with hand- engineered solutions. However, these results are not as grandiose as those achieved by large models trained on internet-scale data; robot learning is still crucially limited by the bottleneck of real- world data collection. To breach this gap, we propose a data augmentation framework that utilises several large pretrained models to generate additional data from a limited set of human demonstrations. By combining pretrained image segmentation, image inpainting and depth estimation models, we can create new scenarios that are not seen in the dataset, but that are still consistent with the task setup. We demonstrate zero-shot capacity on a real robot, by training an agent on our augmented dataset to successfully manipulate objects that did not exist in the original collected data.
0 Replies

Loading