BEV-DG: Cross-Modal Learning under Bird's-Eye View for Domain Generalization of 3D Semantic Segmentation

Miaoyu Li, Yachao Zhang, Xu Ma, Yanyun Qu, Yun Fu

Published: 01 Jan 2023, Last Modified: 16 May 2025ICCV 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Cross-modal Unsupervised Domain Adaptation aims to exploit the complementarity of 2D-3D data to overcome the lack of annotation in an unknown domain. However, the training of these methods relies on access to target samples, meaning the trained model only works in a specific target domain. In light of this, we propose cross-modal learning under bird’s-eye view for Domain Generalization (DG) of 3D semantic segmentation, called BEV-DG. DG is more challenging because the model cannot access the target domain during training, meaning it needs to rely on cross-modal learning to alleviate the domain gap. Since 3D semantic segmentation requires the classification of each point, existing cross-modal learning is directly conducted point-to-point, which is sensitive to the misalignment in projections between pixels and points. To this end, our approach aims to optimize domain-irrelevant representation modeling with the aid of cross-modal learning under bird’s-eye view. We propose BEV-based Area-to-area Fusion (BAF) to conduct cross-modal learning under bird’s-eye view, which has a higher fault tolerance for point-level misalignment. Furthermore, to model domain-irrelevant representations, we propose BEV-driven Domain Contrastive Learning (BDCL) with the help of cross-modal learning under bird’s-eye view. We design three domain generalization settings based on three 3D datasets, and BEV-DG significantly outperforms state-of-the-art competitors with tremendous margins in all settings.