PLPFusion: Plane-Line-Pixel Fully Sparse Fusion for Robust Multi-Modal 3D Object Detection

Jingfu Hou, Hong Song, Jinfu Li, Yucong Lin, Tianyu Huang, Jugang He, Xiuwei He, Jian Yang

Published: 01 Jan 2025, Last Modified: 27 Jan 2026IEEE Transactions on Circuits and Systems for Video TechnologyEveryoneRevisionsCC BY-SA 4.0

Abstract: Fully sparse fusion makes an excellent balance between efficiency and accuracy in multi-modal 3D object detection. However, most existing methods focus on foreground objects while overlooking background context. This oversight compromises detection robustness, especially for occluded or small-sized objects, leading to suboptimal detection performance. To address this limitation, we propose a novel fully sparse fusion framework (PLPFusion), which introduces a hierarchical Plane-Line-Pixel representation to progressively model the object-context relationships. PLPFusion comprises three key modules: the Plane Enhancement Module (PEM), the Line Alignment Module (LAM) and the Pixel-Level Aggregation Module (PLAM). Firstly, PEM utilizes geometric cues from LiDAR feature planes to generate spatially-aware object queries. Secondly, LAM further refines these queries with geometric priors for semantic awareness. Lastly, PLAM aggregates pixel-level context to enhance discriminative completeness by leveraging the semantically-aware object queries. On the nuScenes benchmark, PLPFusion achieves 71.9% mAP and 74.0% NDS, outperforming the baseline method FUTR3D by +2.5% mAP and +1.9% NDS, respectively. On the KITTI benchmark, it achieves 72.68% BEV mAP and 67.39% 3D mAP. These results confirm its robustness and effectiveness in diverse multi-modal 3D scenarios. The code of PLPFusion is available on the https://github.com/Text357/PLPFusion.

External IDs:doi:10.1109/tcsvt.2025.3644414