TL;DR: We enforce a local Euclidean geometry in the latent manifold of scRNA-seq VAEs to improve downstream applications.
Abstract: Latent space interpolations are a powerful tool for navigating deep generative models in applied settings. An example is single-cell RNA sequencing, where existing methods model cellular state transitions as latent space interpolations with variational autoencoders, often assuming linear shifts and Euclidean geometry. However, unless explicitly enforced, linear interpolations in the latent space may not correspond to geodesic paths on the data manifold, limiting methods that assume Euclidean geometry in the data representations. We introduce FlatVI, a novel training framework that regularises the latent manifold of discrete-likelihood variational autoencoders towards Euclidean geometry, specifically tailored for modelling single-cell count data. By encouraging straight lines in the latent space to approximate geodesic interpolations on the decoded single-cell manifold, FlatVI enhances compatibility with downstream approaches that assume Euclidean latent geometry. Experiments on synthetic data support the theoretical soundness of our approach, while applications to time-resolved single-cell RNA sequencing data demonstrate improved trajectory reconstruction and manifold interpolation.
Lay Summary: Understanding how the state of a cell, the basic unit of life, changes during disease or development is a cornerstone of modern biomedical research. To a molecular biologist, a cell’s state is defined by the expression patterns of hundreds of genes at once, shaping the intricate molecular landscape of living organisms. To a data scientist, on the other hand, cells are massive matrices of numbers, each entry capturing how many copies of a particular gene were detected in a single cell.
Because profiling cells involves thousands of noisy, sparse measurements, it’s common to compress this complexity into a dense, interpretable representation, what we call a latent space. To study how cells evolve into one another, techniques use linear interpolations of latent representations to approximate shifts within biological processes. But this raises a key question: how can we be sure that straight lines in this abstract space actually correspond to real biological progressions?
In our work, we propose a principled yet simple answer. We introduce a penalty in the latent space that encourages Euclidean behavior, ensuring that the straight line between any two points is also the shortest path. In essence, we shape the latent space so that straight lines correspond to plausible gene expression transitions, offering an intuitive and mathematically grounded way to explore how cells change. Our method can be easily plugged into existing single-cell analysis tools, enriching their insights with representations that align with their core assumptions.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Primary Area: Applications->Health / Medicine
Keywords: scRNA-seq, Riemannian geometry, representation learning, trajectory inference, VAEs, statistical manifolds
Submission Number: 12560
Loading