Keywords: computer vision, 3d detection, multi-modal, point clouds
Abstract: Current LiDAR-only 3D detection methods inevitably suffer from the sparsity of point clouds. Sparse point clouds can confuse detectors as they lack sufficient geometric and semantic information. Many multi-modal methods are proposed to alleviate this issue, while different representations of images and point clouds make it difficult to fuse them, resulting in suboptimal performance. In this paper, we present a new multi-modal framework named SFD (Sparse Fuse Dense) to tackle these issues. Specifically, we propose to enhance sparse point clouds generated from LiDAR with dense pseudo point clouds generated from depth completion. To make full use of information from different types of point clouds, we design a new RoI feature fusion method 3D-GAF (3D Grid-wise Attentive Fusion), which fuses 3D RoI features from the couple of point clouds in a grid-wise attentive way. In addition, we devise a CPFE (Color Point Feature Extractor) to extract both 3D geometric and 2D semantic features in pseudo point clouds. Moreover, we introduce a multi-modal data augmentation method named SynAugment to utilize all data augmentation approaches tailored to LiDAR-only methods. Our method holds the highest entry on the KITTI 3D object detection leaderboard∗, demonstrating the effectiveness of SFD. Codes will be public.
One-sentence Summary: We propose a new multi-modal framework that enhances sparse raw point clouds with dense pseudo point clouds generated from depth completion.
6 Replies
Loading