Abstract: Monocular 3-D object detection is a low-cost and challenging task for autonomous vehicles and robotics. Utilizing a monocular image for 3-D object detection is served as an auxiliary module for autonomous vehicles and is a growing concern recently. Currently, the expensive lidar and stereo cameras have a predominant performance on accurate 3-D object detection, whereas monocular-based methods are considerably lower in performance. This performance gap is minimized by reforming the monocular-based method as a single internal network. We exploit the correlation between 2-D and 3-D detection spaces, enabling 3-D boxes to leverage feature maps generated in image space. The 2-D and 3-D proposals are extracted through a proposal generation network that is enhanced and utilized for estimating accurate 3-D detection and localization. Experimental results on the KITTI dataset demonstrate that in comparison to other monocular object detection methods the proposed method considerably improved the accuracy of 3-D object detection. The mean average precision of 3-D object detection in front view is improved to 25% and the bird's eye view to 32% for the car class on a moderate difficulty level.
0 Replies
Loading