LVP: Leverage Virtual Points in Multimodal Early Fusion for 3-D Object Detection

Yidong Chen, Guorong Cai, Ziying Song, Zhaoliang Liu, Binghui Zeng, Jonathan Li, Zongyue Wang

Published: 01 Jan 2025, Last Modified: 15 May 2025IEEE Trans. Geosci. Remote. Sens. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Due to the sparsity and occlusion of point clouds, pure point cloud detection has limited effectiveness in detecting such samples. Researchers have been actively exploring the fusion of multimodal data, attempting to address the bottleneck issue based on LiDAR. In particular, virtual points, generated through depth completion from front-view RGB image, offer the potential for better integration with point clouds. Nevertheless, recent approaches fuse these two modalities in the region of interest (RoI), which limits the fusion effectiveness due to the inaccurate RoI region issue in the point cloud’s branch, especially in hard samples. To overcome it and unleash the potential of virtual points, while combining late fusion, we present leverage virtual point (LVP), a high-performance 3-D object detector which LVPs in early fusion to enhance the quality of RoI generation. LVP consists of three early fusion modules: virtual points painting (VPP), virtual points auxiliary (VPA), and virtual points completion (VPC) to achieve point-level fusion and global-level fusion. The integration of these modules effectively improves occlusion handling and improves the detection of distant small objects. In the KITTI benchmark, LVP achieves 85.45% 3-D mAP. As for large dataset nuScenes, we could improve the detection accuracy of large objects by compensating for errors in depth estimation. Without whistles and bells, these results establish LVP as an impressive solution for a 3-D outdoor object detection algorithm.