RangeSAM: On the Potential of Visual Foundation Models for Range-View represented LiDAR segmentation

Paul Julius Kühn; Duc Anh Nguyen; Arjan Kuijper; Saptarshi Neil Sinha

RangeSAM: On the Potential of Visual Foundation Models for Range-View represented LiDAR segmentation

Paul Julius Kühn, Duc Anh Nguyen, Arjan Kuijper, Saptarshi Neil Sinha

Published: 26 Jan 2026, Last Modified: 26 Jan 2026FoMoV OralEveryoneRevisionsCC BY 4.0

Keywords: Point Cloud Segmentation, Visual Foundation Models, Computer Vision, Machnine Learning, LiDAR Segmentation

TL;DR: RangeSAM adapts SAM2 for fast, efficient LiDAR point cloud segmentation using range-view representations, achieving competitive results on SemanticKITTI.

Abstract: LiDAR point cloud segmentation is central to autonomous driving and 3D scene understanding. While voxel- and point-based methods dominate recent research due to their compatibility with deep architectures and ability to capture fine-n geometry, they often incur high computational cost, irregular memory access, and limited runtime efficiency due to scaling issues. In contrast, range-view methods, though relatively underexplored - can leverage mature 2D semantic segmentation techniques for fast and accurate predictions. Motivated by the rapid progress in Visual Foundation Models (VFMs) for captioning, zero-shot recognition, and multimodal tasks, we investigate whether SAM2, the current state-of-the-art VFM for segmentation tasks, can serve as a strong backbone for LiDAR point cloud segmentation in the range view representations. We present \$textbf{RangeSAM}$, to our knowledge, the first range-view framework that adapts SAM2 to 3D segmentation, coupling efficient 2D feature extraction with projection/back-projection to operate on point clouds. To optimize SAM2 for range-view representations, we implement several architectural modifications to the encoder: (1) a novel $\textbf{Stem}$ module that emphasizes horizontal spatial dependencies inherent in LiDAR range images, (2) a customized configuration of $\textbf{Hiera Blocks}$ tailored to the geometric properties of spherical projections, and (3) an adapted $\textbf{Window Attention}$ mechanism in the encoder backbone specifically designed to capture the unique spatial patterns and discontinuities present in range-view pseudo-images. Our approach achieves competitive performance on SemanticKITTI while benefiting from the speed, scalability, and deployment simplicity of 2D-centric pipelines. This work highlights the viability of VFMs as general-purpose backbones for point cloud segmentation and opens a path toward unified, foundation-model-driven LiDAR segmentation. Results let us conclude that range-view segmentation methods using VFMs lead to promising results.

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 4

Loading