Great patients embed alike: contrastive learning for sample representation from single-cell data

ICLR 2026 Conference Submission22370 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: set representation learning, contrastive learning, multiple-instance learning, single-cell transcriptomics, computational biology, sample representation
TL;DR: We create a contrastive learning framework for efficient self-supervised learning of sample representations from single-cell data
Abstract: Single-cell transcriptomics has revolutionized cellular biology by measuring gene activity in thousands of cells per donor, giving insights into cellular processes in normal tissue and early-stage disease. Sample representation methods encode all cells from one patient as a single patient vector, enabling applications of single-cell data for health state classification, prediction of future clinical status, and patient stratification. However, current single-cell datasets have fewer than a thousand samples, making it challenging for the models to learn generalisable and robust sample representations. To overcome this limitation, we suggest learning sample representations in a self-supervised way, relying on set representation invariance to subsampling. We develop SampleCLR, a contrastive learning method, which can be extended to supervised task prediction by the multiple instance learning framework. We show that SampleCLR outperforms unsupervised methods when trained in a self-supervised way, and reaches state-of-the-art quality of sample representation when fine-tuned on a supervised task, despite having orders of magnitude fewer parameters than other methods. We further demonstrate that SampleCLR is interpretable by design via the cell importance module and learns signatures of COVID-19 severity. We envision SampleCLR to pave the way for diagnostic applications from single-cell data. The code is available at https://github.com/sampleclr-iclr2026/SampleCLR.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 22370
Loading