RangeSAM: On the Potential of Visual Foundation Models for Range-View represented LiDAR segmentation
Keywords: Point Cloud Segmentation, Visual Foundation Models, Computer Vision, Machnine Learning, LiDAR Segmentation
TL;DR: RangeSAM adapts SAM2 for fast, efficient LiDAR point cloud segmentation using range-view representations, achieving competitive results on SemanticKITTI.
Abstract: LiDAR point cloud segmentation is central to autonomous driving and 3D scene understanding. While voxel- and point-based methods dominate recent research due to their compatibility with deep architectures and ability to capture fine-n geometry, they often incur high computational cost, irregular memory access, and limited runtime efficiency due to scaling issues. In contrast, range-view methods, though relatively underexplored - can leverage mature 2D semantic segmentation techniques for fast and accurate predictions. Motivated by the rapid progress in Visual Foundation Models (VFMs) for captioning, zero-shot recognition, and multimodal tasks, we investigate whether SAM2, the current state-of-the-art VFM for segmentation tasks, can serve as a strong backbone for LiDAR point cloud segmentation in the range view representations. We present \$textbf{RangeSAM}$, to our knowledge, the first range-view framework that adapts SAM2 to 3D segmentation, coupling efficient 2D feature extraction with projection/back-projection to operate on point clouds. To optimize SAM2 for range-view representations, we implement several architectural modifications to the encoder: (1) a novel $\textbf{Stem}$ module that emphasizes horizontal spatial dependencies inherent in LiDAR range images, (2) a customized configuration of $\textbf{Hiera Blocks}$ tailored to the geometric properties of spherical projections, and (3) an adapted $\textbf{Window Attention}$ mechanism in the encoder backbone specifically designed to capture the unique spatial patterns and discontinuities present in range-view pseudo-images.
Our approach achieves competitive performance on SemanticKITTI while benefiting from the speed, scalability, and deployment simplicity of 2D-centric pipelines.
This work highlights the viability of VFMs as general-purpose backbones for point cloud segmentation and opens a path toward unified, foundation-model-driven LiDAR segmentation. Results let us conclude that range-view segmentation methods using VFMs lead to promising results.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 4
Loading