Abstract: Availability of datasets is a strong driver for
research on 3D semantic understanding, and whilst obtaining
unlabeled 3D point cloud data is straightforward, manually
annotating this data with semantic labels is time-consuming
and costly. Recently, Vision Foundation Models (VFMs) enable
open-set semantic segmentation on camera images, potentially
aiding automatic labeling. However, VFMs for 3D data have
been limited to adaptations of 2D models, which can introduce
inconsistencies to 3D labels. This work introduces Label Any
Pointcloud (LeAP), leveraging 2D VFMs to automatically label
3D data with any set of classes in any kind of application whilst
ensuring label consistency. Using a Bayesian update, point
labels are combined into voxels to improve spatio-temporal
consistency. A novel 3D Consistency Network (3D-CN) exploits
3D information to further improve label quality. Through
various experiments, we show that our method can generate
high-quality 3D semantic labels across diverse fields without
any manual labeling. Further, models adapted to new domains
using our labels show up to a 34.2 mIoU increase in semantic
segmentation tasks.
Loading