Abstract: Vision-based semantic scene completion task aims to predict dense geometric and semantic 3D scene representations from 2D images. However, 3D modeling from a single view is an ill-posed problem, limited by the field of view and occlusion problems caused by image input. Moreover, existing methods tend to produce erroneous scene hallucinations and overly smooth boundary segmentation due to a lack of information. To address this problem, we propose MixSSC, which mixes the sparsity of forward projection with the denseness of depth-prior backward projection. The aim is to use sparse features to fill information-poor regions and dense features to enhance visible regions. Specifically, we develop the forward-backward mixture module, which enables the generation of scene mixture voxel representation by leveraging the benefits of both forward and backward projection. Subsequently, we design the semantic-spatial fusion module, which utilizes a coarse-to-fine approach to process mixture voxel features at the semantic-spatial level. Extensive experimental results on the SemanticKITTI, SSCBench-KITTI-360 and nuScenes datasets demonstrate the superiority of MixSSC. Our code is available on https://github.com/willemeng/MixSSC.
External IDs:dblp:journals/tcsv/WangDLQLT25
Loading