Keywords: multimodal data integration, single-cell analysis, optimal transport, contrastive learning
TL;DR: MatchCLOT is a novel method for the task of modality matching in single-cell data integration built on a contrastive learning framework that exploits optimal transport.
Abstract: Recent advances in single-cell technologies have enabled the simultaneous quantification of multiple biomolecules in the same cell, opening new avenues for understanding cellular complexity and heterogeneity. However, the resulting multimodal single-cell datasets present unique challenges arising from the high dimensionality of the data and the multiple sources of acquisition noise. In this work, we propose MatchCLOT, a novel method for single-cell data integration based on ideas borrowed from contrastive learning, optimal transport, and transductive learning. In particular, we use contrastive learning to learn a common representation between two modalities and apply entropic optimal transport as an approximate maximum weight bipartite matching algorithm. Our model obtains state-of-the-art performance in the modality matching task from the NeurIPS 2021 multimodal single-cell data integration challenge, improving the previous best competition score by 28.9%.