Automatic generation of realistic training data for learning parallel-jaw grasping from synthetic stereo images
Abstract: This paper proposes a novel approach to automatically generate labeled training data for predicting parallel-jaw grasps from stereo-matched depth images. We generate realistic depth images using Semi-Global Matching to compute disparity maps from synthetic data, which allows producing images that mimic the typical artifacts from real stereo matching in our data, thus reducing the gap from simulation to real execution. Our pipeline automatically generates grasp annotations for single or multiple objects on the synthetically rendered scenes, avoiding any manual image pre-processing steps such as inpainting or denoising. The labeled data is then used to train a CNN-model that predicts parallel-jaw grasps, even in scenarios with large amount of unknown depth values. We further show that scene properties such as the presence of obstacles (a bin, for instance) can be added to our pipeline, and the training process results in grasp prediction success rates of up to 90%.
Loading