Learning Population-Level Representations with Joint Embedding Predictive Architectures

ICLR 2026 Conference Submission22367 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Representation learning, JEPA, single cell transcriptomics, patient representation
TL;DR: We developed a self-supervised approach for learning representation of multivariate population data and showcased its usage in biomedical field for molecular patients stratification.
Abstract: Multivariate population data is ubiquitous across scientific and real-world domains, arising in settings where the identity of a system is revealed through the composition of its constituent samples. For example, a patient’s clinical state can be inferred from the joint analysis of their blood cells, while the properties of a galaxy can be characterized from the distribution of its stars and their spectra. To our knowledge, attempts to learn representations of such data remain limited, largely because its inductive structure is subtle, making feature extraction particularly challenging. Inspired by recent advances in joint embedding predictive architectures, we challenge the prevailing assumption that population-level data lacks sufficient signal for representation learning, and show that by leveraging both the compositional structure of the data and the properties of individual samples, rich and expressive representations can indeed be learned. We demonstrate our approach in the biomedical domain, addressing the long-standing challenge of scaling machine learning to large single-cell transcriptomics datasets for patient representation.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 22367
Loading