Keywords: Detection, Lidar, Point-Clouds
Abstract: Accurate 3D object detection and segmentation from LiDAR point clouds require both global context and fine-grained local features. Sparse convolutions capture local geometry efficiently but have limited receptive fields, while transformers model long-range context at high memory and runtime costs and often miss fine detail. We introduce Dilated Uniform Attention with 3D Sparse Convolution (DUA-SConv), a building block that integrates attention and sparse convolution in a complementary way. Each block applies self-attention over a uniformly dilated neighborhood spanning a large, fixed region to provide coarse global context, followed by sparse convolution to recover fine-grained local features. Stacked DUA-SConv blocks form a compact backbone that achieves high accuracy in 3D detection and segmentation with low runtime and parameter count.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 6939
Loading