Abstract: Highlights•We revisit the key factor of LiDAR-camera fusion, namely the soft joint mechanism.•We develop an attention-based multimodal fusion in point cloud segmentation.•We build multi-scale pairwise inputs and interact using the dual-stream transformer.•We propose unimodal data augmentation and cross-modal contrastive learning.
Loading