Abstract: City-scale point cloud semantic segmentation is an important yet challenging task. Despite progress, existing methods rely heavily on point-wise annotations. An alternative solution is to apply the Unsupervised Domain Adaptation (UDA) approach. Recently, the 2D foundation model has achieved significant progress with training with internet-scale images. Therefore, adapting 2D foundation models to 3D City-scale point clouds is an attempting idea. Due to the data protection and storage issue, 2D source domain data is typically unavailable. Thus, we focus on Source-Free Domain Adaptation (SFDA) and propose a Source-Free City-scale point cloud semantic segmentation method, namely SF-City. Our method leverages knowledge from 2D pre-trained models to generate point-wise pseudo labels for training a 3D semantic segmentation network. We convert point clouds into remote-sensing-like images using Bird’s-Eye-View (BEV) projection. However, directly using source models for pseudo label generation is hindered by domain gaps such as viewpoint variations, concept divergences, and geometry loss. To tackle these problems, we propose a Multi-scale Content Feature Extractor (MCFE) to extract holistic and contextual feature representations. Then, an Uncertainty-guided Inter-Model Feature Integrator (UIFI) is introduced to integrate inherent knowledge across source models. Furthermore, the Geometric-guided Pseudo Label Generator (GPLG) is leveraged to introduce geometric information to regulate pseudo labels. Through extensive experiments on two public benchmarks, SF-City demonstrates superior performance, achieving an mIoU of 28.8% on the SensatUrban dataset, outperforming recent state-of-the-art methods CLIP-FO3D by about 6.3% .
Loading