Semantic-aware Fine-grained Point Augmentation for 3D Multi-modal Object Detection

Published: 2025, Last Modified: 09 Nov 2025ICME 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: 3D object detection aims to locate and recognize the object from the point cloud, which is a meaningful and foundation task in autonomous driving. However, the sparsity of the point cloud poses a significant challenge for this task, especially for distant and small objects. Existing methods employ depth estimation networks to generate pseudo points for improving the point density, but this introduces significant computational costs and noise, limiting performance gains. In this paper, we propose the Semantic-aware Fine-grained Point Augmentation (SFPA) approach for 3D object detection, which simultaneously enriches high-quality point clouds and filters noisy points, and incorporates multi-modal feature fusion to enhance detection performance. Specifically, we utilize a semantic segmentation model to generate object masks from RGB images and refine dense depth estimation maps, derived from sparse LiDAR points and RGB images, using these foreground object masks. Subsequently, high-quality pseudo point clouds, concentrated solely on foreground objects, are generated by projecting the refined dense depth maps back to 3D coordinates. Furthermore, we also employ the projection matrix as an alignment strategy to concatenate or add dense RGB features with point features, further improving detection performance for extremely sparse objects. Experimental results demonstrate that our method achieves state-of-the-art performance on KITTI 3D object detection leaderboard, i.e., 95.44%, 88.18%, 85.53% for the Car category at the easy, medium, and hard levels, respectively.
Loading