Representation learning of single-cell RNA-seq data

Constantin Ahlmann-Eltze, Florian Barkmann, Jan Lause, Valentina Boeva, Dmitry Kobak

Published: 08 Jan 2026, Last Modified: 26 Jan 2026RNAEveryoneRevisionsCC BY-SA 4.0

Abstract: Single-cell RNA sequencing (scRNA-seq) has become a cornerstone experimental technique in cellular biology, with gene expression data for over 100 million sequenced cells available in public repositories. The high dimensionality, sparsity, and technical noise inherent to scRNA-seq data have motivated the development of a broad spectrum of representation learning approaches. These methods learn denoised, low-dimensional representations of single-cell transcriptomes that can then be used for clustering, visualization, trajectory inference, and other downstream analyses. Furthermore, methods have emerged that learn latent representations based on scRNA-seq data pooled across multiple experiments. In this review, we frame factor models, autoencoders, contrastive learning approaches, and transformer-based foundation models as distinct paradigms of representation learning for scRNA-seq. We provide a coherent taxonomy of these methods that articulates their conceptual foundations, shared assumptions, and key distinctions. We also discuss existing benchmarks and identify the major challenges and open questions that will shape the future of the field.

External IDs:doi:10.1261/rna.080889.125