Keywords: Pretrained Models, Robot Learning, Imitation Learning, Vision Language Model, Data Augmentation
TL;DR: In robot imitation learning for robot manipulation setting , used large scale pretrained vision-language mode to augmented robot demonstration data using inpaining and depth estimation model to create new data for model training.
Abstract: Recent advances in robot learning have shown
promise in achieving multitask control and generalisation to
novel scenarios—a feat previously difficult to achieve with hand-
engineered solutions. However, these results are not as grandiose
as those achieved by large models trained on internet-scale data;
robot learning is still crucially limited by the bottleneck of real-
world data collection. To breach this gap, we propose a data
augmentation framework that utilises several large pretrained
models to generate additional data from a limited set of human
demonstrations. By combining pretrained image segmentation,
image inpainting and depth estimation models, we can create
new scenarios that are not seen in the dataset, but that are
still consistent with the task setup. We demonstrate zero-shot
capacity on a real robot, by training an agent on our augmented
dataset to successfully manipulate objects that did not exist in
the original collected data.
0 Replies
Loading