Roadside Monocular 3D Detection via 2D-Detection Prompting

15 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: roadside 3D detection, monocular 3D detection, object detection
Abstract: The problem of roadside monocular 3D detection requires detecting objects of interested classes (e.g., vehicle and pedestrian) in a 2D RGB frame and predicting their 3D information such as Bird-Eye-View (BEV) locations. It has broad applications such as traffic control, vehicle-vehicle communication and vehicle-infrastructure cooperative perception. To approach this problem, we present a novel and simple method that significantly outperforms prior arts by exploiting 2D detections to help 3D detections based on two key insights. First, 2D detectors are much easier to train and perform significantly better than 3D detectors if measured on the 2D image plane. Second, plenty of publicly available 2D-box annotated datasets allows pretraining a strong base detector, which, once finetuned, yields a much better 2D detector for the roadside dataset. To exploit the 2D detector for 3D detection, we explore three techniques: (1) concatenating both 2D and 3D detectors’ features, (2) prompting and attentively fusing 2D and 3D detectors’ features, and (3) prompting and encoding predicted 2D boxes’ {x, y, width, height, label} and attentively fusing such with the 3D detector’s features. Surprisingly, the third performs significantly better than the others. We conjecture that prompting 2D detections gives pinpointed object targets for the 3D detector to learn how to inflate them to BEV as 3D detections. Moreover, we suggest a class-grouping strategy that merges classes based on their functionality, which leads to further improvements. Comprehensive ablation studies and extensive experiments demonstrate that our method achieves the state-of-the-art on two existing large-scale roadside 3D detection benchmarks.
Supplementary Material: zip
Primary Area: applications to robotics, autonomy, planning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 91
Loading