Geometric-Aware Mapping and Uncertainty Modeling for Semantic Scene Completion

Published: 01 Jan 2025, Last Modified: 12 Nov 2025ICME 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Semantic scene completion aims to simultaneously infer voxel occupancy and semantic categories of a 3D scene from a single depth and/or RGB image. Most existing methods usually use lossy projection operations (such as MaxPool and AvgPool) to deal with the many-to-one problem in the 2D-3D mapping process, which may lead to a loss of crucial 2D information due to the inherent compression effect. To address this, we propose a novel framework that incorporates Geometric-Aware Mapping (GAM) and Voxel-Wise Uncertainty Modeling (VWUM) to improve the accuracy and robustness. Specifically, GAM introduces a distance-weighted mapping strategy to preserve fine-grained 2D details during 2D-to-3D mapping, which ensures that features closer to the voxel center make a greater contribution. Furthermore, VWUM models voxel predictions as Gaussian distributions to explicitly quantify uncertainty, allowing the framework to adaptively estimate confidence levels and mitigate the effects of noisy or ambiguous data. Experimental results on the NYU and NYUCAD datasets show significant improvements in both geometric accuracy and semantic quality.
Loading