Multi-Camera 3D Object Detection for Autonomous Driving Using Deep Learning and Self-Attention Mechanism

Published: 01 Jan 2023, Last Modified: 28 May 2025IEEE Access 2023EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In the absence of depth-centric sensors, 3D object detection using only conventional cameras becomes ill-posed and inaccurate due to the lack of depth information in the RGB image. We propose a multi-camera perception solution to predict the 3D properties of the vehicle obtained from the aggregated information from multiple static infrastructure-installed cameras. While a multi-bin regression loss has been adopted to predict the orientation of a 3D bounding box using a convolutional neural network, combining it with the geometrical constraints of a 2D bounding box to form a 3D bounding box is not accurate enough for all the driving scenarios and orientations. This paper leverages a vision transformer that overcomes the drawbacks of convolutional neural networks when there are no external LiDAR or pseudo-LiDAR pre-trained datasets available for depth map estimation, particularly in occluded regions. By combining the predicted 3D boxes from various cameras using an average weighted score algorithm, we determine the best bounding box with the highest confidence score. Comprehensive simulations for performance analysis are shown from the results obtained by utilizing the KITTI standard data generated from the CARLA simulator.
Loading