Abstract: Highlights•Proposing global contrastive learning based on multimodal representation.•Devising multiple techniques to define the negatives/positives for each anchor.•Leveraging label information to conduct supervised contrastive learning.•Outperforming baselines on multimodal sentiment analysis and humor detection.•Proposing permutation-invariant fusion that can benefit from complex fusion methods.
Loading