Learning Occupancy for Monocular 3D Object Detection

Published: 01 Jan 2024, Last Modified: 13 Nov 2024CVPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Monocular 3D detection is a challenging task due to the lack of accurate 3D information. Existing approaches typically rely on geometry constraints and dense depth esti-mates to facilitate the learning, but often fail to fully ex-ploit the benefits of three-dimensional feature extraction in frustum and 3D space. In this paper, we propose Occu- pancyM3D, a method of learning occupancy for monocu-lar 3D detection. It directly learns occupancy in frustum and 3D space, leading to more discriminative and informative 3D features and representations. Specifically, by using synchronized raw sparse LiDAR point clouds, we define the space status and generate voxel-based occupancy labels. We formulate occupancy prediction as a simple classification problem and design associated occupancy losses. Re-sulting occupancy estimates are employed to enhance orig-inal frustum/3D features. As a result, experiments on KITTI and Waymo open datasets demonstrate that the proposed method achieves a new state of the art and surpasses other methods by a significant margin.
Loading