Envisioning the Unseen: Revolutionizing Indoor Spaces with Deep Learning-Enhanced 3D Semantic Segmentation

29 Mar 2024 (modified: 27 Apr 2024)Submitted to VLADR 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Semantic Segmentation, Deep Learning, Indoor environment, Image Visualization
TL;DR: Semantic Segmentation of indoor environment
Abstract: In recent years, advancements in indoor sensor technology and 3D model acquisition methods have led to a significant increase in the volume of indoor three-dimensional (3D) point cloud models. However, these extensive, "blind" point clouds present substantial challenges for advanced indoor applications and GIS analysis due to their lack of semantic segmentation. Addressing this demand, our study explores the spatial dimensions of semantic segmentation through the application of convolutional neural networks (CNNs). Utilizing a structured dataset, we aim to predict point cloud characteristics and assess the accuracy of machine learning models. The dataset is divided into training and testing segments, both subjected to an extensive training process. The outcomes are then compared with current state-of-the-art benchmarks and visualized to demonstrate our model’s efficacy. Historically, the segmentation of 3D point clouds has been hindered by the absence of robust 3D features, limited 3D training data, and the complexities inherent to indoor environments, such as high occlusion, uneven lighting, and diverse objects. To overcome these challenges, we propose an effective algorithm for transferring semantic labels from 2D semantic images to raw 3D point clouds. This method establishes a foundation for 3D semantic point cloud models that effectively resolve the issues related to object semantics and unclear spatial structures. Leveraging the SfM point cloud model’s attributes and utilizing extensive 2D image databases, our algorithm estimates the structural layout and semantic labels of images, transferring this information as semantic labels to 3D point cloud data. This approach simplifies the extraction of structure and semantics from 3D point clouds. To demonstrate the algorithm’s performance in complex indoor settings, we introduce a new architecture, the Large-scale Residual Connection, which transmits spatial information from lower to higher levels. Additionally, we incorporate the Atrous Spatial Pyramid Pooling (ASPP) of DeepLabv3+ and the DenseBlock structure of DenseNet, along with a multi-stage training strategy to address the challenges posed by indoor environments’ occlusions and complexity. Our methodology, aimed at robust semantic segmentation of indoor 3D point clouds, comprises two major innovations. First, a novel Combined Network is designed to label 2D images and estimate indoor spatial layouts, enhancing classification capabilities. Second, we implement a 2D-3D label propagation based on a graphical model, facilitating label transfer from 2D to 3D and constructing contextual consistency between images. Notably, our approach does not require any 3D scene training data, yet it achieves remarkable segmentation results in complex indoor scenes with an accuracy of 87%. Our experiments, conducted on the public NYUDv2 indoor dataset and a proprietary local dataset, demonstrate that compared to leading-edge techniques in 2D semantic segmentation, DeepLabv3+ adeptly learns discriminative features for inter-class segmentation while preserving clear boundaries for intra-class distinctions.
Supplementary Material: pdf
Submission Number: 12
Loading