SPHERE: Semantic-PHysical Engaged REpresentation for 3D Semantic Scene Completion

Zhiwen Yang, Yuxin Peng

Published: 04 Jul 2025, Last Modified: 28 Jan 2026ACM MM 2025EveryoneRevisionsCC BY 4.0

Abstract: Camera-based 3D Semantic Scene Completion (SSC) is a critical task in autonomous driving systems, aiming to assess voxel-level geometry and semantics for holistic scene perception. While existing voxel-based and plane-based SSC methods have achieved considerable progress in semantic perception accuracy, they struggle to capture physical regularities for realistic geometric details. On the other hand, neural reconstruction methods like NeRF and 3DGS demonstrate superior physical awareness, but suffer high computational cost and slow convergence when handling large-scale complex autonomous driving scenes, leading to inferior semantic accuracy. To address these issues, we propose the Semantic-PHysical Engaged REpresentation (SPHERE) for camera-based 3D semantic scene completion, which integrates voxel and Gaussian representations for joint exploitation of semantic and physical information. First, we propose a Semantic-guided Gaussian Initialization (SGI) module that leverages dual-branch 3D scene representations to locate focal voxels with discriminative semantics as anchors, enabling efficient and effective Gaussian initialization. Then, we propose a Physical-aware Harmonics Enhancement (PHE) module that incorporates semantic spherical harmonics to model physical-aware contextual details and thereby jointly enhance voxel and Gaussian representations through focal distribution alignment, promoting semantic-geometry consistency for semantic scene completion results with realistic details. Extensive experiments and analyses on the the SemanticKITTI and SSCBench-KITTI-360 datasets validate the effectiveness of SPHERE.