Deep generative modeling of sample-level heterogeneity in single-cell genomics

Published: 10 May 2024, Last Modified: 05 Feb 2025PreprintEveryoneCC BY 4.0
Abstract: The field of single-cell genomics is now observing a marked increase in the prevalence of cohort-level studies that include hundreds of samples and feature complex designs. These data have tremendous potential for discovering how sample or tissue-level phenotypes relate to cellular and molecular composition. However, current analyses are based on simplified representations of these data by averaging information across cells. We present MrVI, a deep generative model designed to realize the potential of cohort studies at the single-cell level. MrVI tackles two fundamental and intertwined problems: stratifying samples into groups and evaluating the cellular and molecular differences between groups, both without requiring a priori grouping of cells into types or states. Due to its single-cell perspective, MrVI is able to detect clinically relevant stratifications of patients in COVID-19 and inflammatory bowel disease (IBD) cohorts that are only manifested in certain cellular subsets, thus enabling new discoveries that would otherwise be overlooked. Similarly, we demonstrate that MrVI can de-novo identify groups of small molecules with similar biochemical properties and evaluate their effects on cellular composition and gene expression in large-scale perturbation studies. MrVI is available as open source at scvi-tools.org.
Loading