Leveraging Foundation Models for Labeling Custom Object Masks in LiDAR Point Cloud Sequences

Published: 01 Jan 2025, Last Modified: 12 Nov 2025ECMR 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: 3D segmentation plays a crucial role in the perception stack of autonomous systems, such as mobile robots, by enabling fine-grained understanding of their surroundings. State-of-the-art approaches rely on deep learning and typically require large-scale annotated datasets, the creation of which is both costly and labor-intensive. Recent advances in vision foundation models, such as Segment Anything 2 (SAM 2), demonstrate strong generalization capabilities in segmenting objects across diverse video data. In this work, we present a novel pipeline for generating high-quality pseudo-labels for 3D point cloud segmentation with minimal human supervision. We propose a custom projection to transfer LiDAR point clouds to an 2D image proxy representation using range and reflectivity data. As a result, sequential LiDAR scans can be effectively treated as video input, which allows us to leverage SAM 2 for fast and efficient LiDAR mask generation. Our method produces accurate labels across a variety of object types and enables the training of 3D segmentation models solely on these semi-automatically generated annotations. Our approach significantly lowers the barrier to applying 3D segmentation in custom domains, especially for object categories not covered in existing public datasets.
Loading