G$^2$-Occ: Geometry-Guided Gaussian Primitives for Embodied Semantic Occupancy Prediction

10 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Embodied Occupancy Prediction, 3D Gaussian
Abstract: This paper introduces a vision-only framework for embodied semantic occupancy prediction based on geometry-guided Gaussian primitives. Our approach implicitly recover scene geometry from monocular color images via pre-trained depth and normal estimation models. The core of our framework departs from traditional random or uniform initialization strategies, instead leveraging the recovered geometric priors to effectively manage the entire lifecycle of Gaussian primitives, including initialization, updating and eventual pruning. Specifically, we design a Geometry-Guided Initialization module that utilizes the recovered geometry to generate Gaussian primitives within potentially occupied regions of the scene, ensuring a rational and efficient primitive distribution from the outset. Subsequently, we propose a Position-Aware Scene Update and Pruning pipeline, which integrates a Position-Aware Gaussian Refinement process and Confidence-Based Fusion and Pruning module. This pipeline is responsible for maintaining the global consistency of the scene representation across continuous online observations while adaptively fusing redundant primitives to manage computational complexity. The effectiveness and advanced nature of our method are thoroughly validated through extensive experiments on four popular indoor semantic occupancy prediction benchmarks, where it demonstrates state-of-the-art performance.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 3602
Loading