CSFNet: Cross-Modal Semantic Focus Network for Semantic Segmentation of Large-Scale Point Clouds

Yang Luo, Ting Han, Yujun Liu, Jinhe Su, Yiping Chen, Jinyuan Li, Yundong Wu, Guorong Cai

Published: 01 Jan 2025, Last Modified: 17 Apr 2025IEEE Trans. Geosci. Remote. Sens. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Semantic segmentation of large-scale point clouds is an indispensable component of outdoor scene perception, providing essential 3-D semantic insights for applications in scene reconstruction, urban planning, autonomous driving, and more. However, the discriminative capability of point clouds features declines with increasing distance from the sensor, causing current methods to usually perform poorly in segmenting distant objects. To overcome this challenge and improve the differentiation between classes with similar geometric features, we propose the cross-modal semantic focus network (CSFNet). Firstly, we design a multiscale feature dynamic fusion (MDF) module to leverage multiscale image features, thereby enriching the feature representation of point clouds with additional images color and texture information. Then, in order to extract the distinguishing features of distant and different categories of objects more efficiently, we propose a semantic focus module (SFM) that employs a multiclass contrastive learning strategy to enhance feature discrimination. Finally, we introduce cross-modal knowledge distillation (KD) to augment the model’s comprehension of point clouds. Extensive experiments conducted on the SemanticKITTI and nuScenes datasets demonstrate the effectiveness of our method. Notably, our method achieves superior segmentation accuracy across multiple classes at various distances compared to current methods.