LDCNet: Long-Distance Context Modeling for Large-Scale 3D Point Cloud Scene Semantic Segmentation

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Large-scale point cloud semantic segmentation is a challenging task in 3D computer vision. A key challenge is how to resolve ambiguities arising from locally high inter-class similarity. In this study, we introduce a solution by modeling long-distance contextual information to understand the scene's overall layout. The context sensitivity of previous methods is typically constrained to small blocks(e.g. $2m \times 2m$) and cannot be directly extended to the entire scene. For this reason, we propose \textbf{L}ong-\textbf{D}istance \textbf{C}ontext Modeling Network(LDCNet). Our key insight is that keypoints are enough for inferring the layout of a scene. Therefore, we represent the entire scene using keypoints along with local descriptors and model long-distance context on these keypoints. Finally, we propagate the long-distance context information from keypoints back to non-keypoints. This allows our method to model long-distance context effectively. We conducted experiments on six datasets, demonstrating that our approach can effectively mitigate ambiguities. Our method performs well on large, irregular objects and exhibits good generalization for typical scenarios.
Relevance To Conference: Point Cloud has become more and more popular for its various applications such as digital preservation, reverse engineering, surveying, architecture, 3D gaming, robotics, and virtual reality. The development of deep-learning-based analysis tools has become a hot topic. In recent years, many deep networks for point cloud semantic segmentation have been proposed but many of them focus on object-level point clouds. Large-scale scenes are difficult to process and analyze because of their huge number of points and complex geometries. In this paper, we demonstrate a new method to deal with large-scale scene semantic segmentation problems. We show the effectiveness of our method on six datasets containing both indoor and outdoor scenes, with applications including but not limited to interior design, autonomous driving, and cultural heritage preservation.
Primary Subject Area: [Content] Media Interpretation
Secondary Subject Area: [Content] Vision and Language, [Experience] Multimedia Applications
Submission Number: 1635
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview