Rethinking How to Capture Long-Range Dependency in 3D Object Detection

Published: 2025, Last Modified: 12 Nov 2025IEEE Trans. Circuits Syst. Video Technol. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: LiDAR-based 3D object detection is essential for autonomous driving. Existing high-performance 3D object detectors usually design complex structures in the 3D backbone to capture long-range dependencies among features. However, introducing these complex structures into the 3D backbone significantly increases computational cost and inference latency, limiting the efficiency and feasibility of detectors in practical applications. In this work, we rethink the long-range dependency capturing problem from a new perspective, that is transferring this task from 3D backbone to 2D feature space. To accomplish this goal, we propose a Long-Range Dense Feature Capture Network (LDFCNet). LDFCNet retains the basic structure of the 3D backbone to extract preliminary 3D features but shifts the complex long-range dependency capturing task to be processed on a 2D dense feature map, thereby enhancing the detection performance while reducing the computational cost. Importantly, a robust 2D dense feature capture (2D-DFC) backbone is devised to effectively and efficiently capture the long-range dependencies. In addition, we introduce a re-parameterization technique to decouple the training and inference of the 2D backbone, further reducing inference latency. We conduct extensive experiments on the Waymo Open and nuScenes datasets and the experimental results show that LDFCNet demonstrates competitive performance. Notably, LDFCNet is $1.5\times $ faster than the state-of-the-art hybrid detector HEDNet and $2.1\times $ faster than the transformer-based detector DSVT. Codes and results are released on https://github.com/asd291614761/LDFCNet.
Loading