Multi-modal self-supervised contrastive representation learning for three-dimensional point cloud understanding
Abstract: Highlights•Image and text modalities are used to enhance point cloud understanding.•The semantic bias among modalities is reduced by global feature and projection units.•Inter-modal and cross-modal contrastive learning better align multi-modal data.•Comprehensive experiments show that our algorithm outperforms other algorithms.
External IDs:dblp:journals/eaai/DingYLGD25
Loading