An Efficient Global-Local Feature Extraction Architecture for 3D Point Clouds

Roy Uziel; Oded Bialer

An Efficient Global-Local Feature Extraction Architecture for 3D Point Clouds

Roy Uziel, Oded Bialer

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Detection, Lidar, Point-Clouds

Abstract: Accurate 3D object detection and segmentation from LiDAR point clouds require both global context and fine-grained local features. Sparse convolutions capture local geometry efficiently but have limited receptive fields, while transformers model long-range context at high memory and runtime costs and often miss fine detail. We introduce Dilated Uniform Attention with 3D Sparse Convolution (DUA-SConv), a building block that integrates attention and sparse convolution in a complementary way. Each block applies self-attention over a uniformly dilated neighborhood spanning a large, fixed region to provide coarse global context, followed by sparse convolution to recover fine-grained local features. Stacked DUA-SConv blocks form a compact backbone that achieves high accuracy in 3D detection and segmentation with low runtime and parameter count.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 6939

Loading