PointLAM: Local Attentive Mamba for Efficient Point-based 3D Object Detection

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: 3D Object Deteciton, Computational Efficiency, State Space Models
TL;DR: An effective and efficient 3D backbone for point-based 3D object detection.
Abstract: 3D object detection from LiDAR faces a fundamental trade-off between computational efficiency and the preservation of fine-grained geometries. The dominant voxel-based paradigm achieves efficiency by quantizing massive point clouds, but at the cost of inevitable information loss. Conversely, point-based methods excel at capturing precise geometries by directly processing raw points, yet have been constrained by the prohibitive complexity of their core operators for downsampling and spatial feature modeling. In this work, we tackle this dillema by introducing PointLAM, a novel framework for point-based 3D object detection that excels both in performance and efficiency. We systematically address the long-standing bottlenecks in point-based models through two synergistic designs. First, we propose a Dynamic Point Sampler (DPS) that intelligently curates an information-rich and structurally representative subset of raw points. It leverages a novel Deviation Network (DevNet) to capture each point's local distinctiveness, followed by a Doubly Sorted Sampling (DSS) strategy that retains the most informative points to reduce the workload of the 3D backbone. Second, our 3D backbone synergizes Bi-Directional Mamba (BDM) layers for global context modeling, and novel, lightweight Local Multiplicative Aggregation (LMA) layers for efficiently capturing intricate local geometries without computationally expensive neighborhood queries. Extensive experiments show that PointLAM sets a new benchmark for efficient point-based 3D object detection. On both nuScenes and Waymo datasets, PointLAM not only significantly surpasses prior point-based models but also achieves comparable performance against strong voxel-based competitors like LION and DSVT. Crucially, these competitive results are achieved with a fraction of the model parameters and latency, demonstrating a superior balance between accuracy and efficiency.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 11762
Loading