Learning Accurate Objectness Instance Segmentation from Photorealistic Rendering for Robotic Manipulation

Siyi Li, Jiaji Zhou, Zhenzhong Jia, Dit-Yan Yeung, Matthew T. Mason

2018 (modified: 08 Nov 2022)ISER 2018Readers: Everyone

Abstract: Recent progress in computer vision has been driven by high-capacity deep convolutional neural network (CNN) models trained on generic large datasets. However, creating large datasets with dense pixel-level labels is extremely costly. In this paper, we focus on the problem of instance segmentation for robotic manipulation using rich image and depth features. To avoid intensive human labeling, we develop an automated rendering pipeline for rapidly generating labeled datasets. Given 3D object models as input, the rendering pipeline produces photorealistic images with pixel-accurate semantic label maps and depth maps. The synthetic dataset is then used to train an RGB-D segmentation model by extending the Mask R-CNN framework for depth input fusion. Our results open up new possibilities for advancing robotic perception using cheap and large-scale synthetic data.

0 Replies