Stands on Shoulders of Giants: Learning to Lift 2D Detection to 3D with Geometry-Driven Objectives

Published: 2025, Last Modified: 04 Nov 2025ICRA 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: 3D detection of vehicles is an essential component for autonomous driving applications. Nevertheless, collecting the supervised training data for learning 3D vehicle detectors would be costly (e.g. utilization of expensive LiDAR sensors) and labor-intensive (for human annotation). In comparison to 3D detection, 2D object detection has achieved a welldeveloped status, boosting stable and robust performance with widespread application in numerous fields, thanks to the large scale (i.e. amount of samples) of existing training datasets of 2D object detection. Hence, in our work, we propose to realize 3D detection via leveraging the robustness of 2D detectors and developing a network that lifts 2D detections to 3D. With the flexibility of building upon various backbone models (e.g. the models which take image regions detected by 2D detector as inputs to predict their corresponding 3D bounding boxes, or the existing monocular 3D detection models which have the intermediate output of 2 D bounding boxes), we propose several geometry-driven objectives, including projection consistency loss, geometry depth loss, and opposite bin loss, to improve the training upon 2D-to-3D lifting. Our extensive experimental results demonstrate that our proposed geometrydriven objectives not only contribute to the superior results of 3D detection but also provide better generalizability across datasets.
Loading