Group Contrastive Learning for Weakly Paired Multimodal Data

Published: 28 May 2026, Last Modified: 28 May 2026ICML 2026 FM4LS Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multimodal learning, Weakly paired data, Contrastive learning, Single-cell genomics
TL;DR: A multi-modal semisupervised representation learning framework for weakly paired multi-modal perturbation data
Abstract: We present GROOVE, a semi-supervised representation learning approach for weakly paired multi-modal data, where samples across modalities share perturbation labels but lack instance-level correspondence. Our central contribution is GroupCLIP, a group-level contrastive loss that fills the gap between CLIP (cross-modal, instance-paired) and SupCon (uni-modal, label-supervised) for the weakly-paired regime. We integrate GroupCLIP with on-the-fly backtranslating autoencoders to learn cross-modally entangled, group-coherent representations. We additionally propose a combinatorial benchmarking framework that pairs representation learners with multiple optimal-transport aligners, revealing that no single aligner uniformly dominates. Across simulations and two single-cell genetic perturbation datasets, GROOVE matches or outperforms existing approaches on cross-modal matching and imputation, with ablations showing GroupCLIP as the key driver of performance gains.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 106
Loading