DGOcc: Depth-aware global query-based network for monocular 3D occupancy prediction

Published: 01 Jan 2025, Last Modified: 03 Aug 2025Neurocomputing 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Monocular 3D occupancy prediction, aiming to predict the occupancy and semantics within interesting regions of 3D scenes from a single 2D image, has garnered increasing attention recently for its vital role in 3D scene understanding. While prior studies primarily focus on either effectiveness or efficiency, achieving high performance with limited computational resources is still challenging. In this paper, we present DGOcc, a Depth-aware Global query-based network for monocular 3D Occupancy prediction, enabling both effectiveness and efficiency. We first explore prior depth maps to extract depth context features that provide explicit geometric information for the occupancy network. Then, in order to fully exploit the depth context features, we propose a Global Query-based (GQ) Module. The cooperation of attention mechanisms and scale-aware operations facilitates the feature interactions between images and 3D voxels. Moreover, a Hierarchical Selective Supervision (HSS) Strategy is designed to avoid upsampling all of the high-dimension 3D voxel features to full resolution, which mitigates GPU memory utilization and time cost. Extensive experiments on SemanticKITTI and SSCBench-KITTI-360 datasets demonstrate that the proposed method surpasses the current state-of-the-art performance on monocular semantic occupancy prediction while having lower GPU and time overhead.
Loading