TL;DR: Modeling the evolution of high-dimensional systems from irregular timed sampled snapshots using overlapping windows of marginal distributions and measure-valued splines.
Abstract: Modeling the evolution of high-dimensional systems from limited snapshot observations at irregular time points poses a significant challenge in quantitative biology and related fields. Traditional approaches often rely on dimensionality reduction techniques, which can oversimplify the dynamics and fail to capture critical transient behaviors in non-equilibrium systems. We present Multi-Marginal Stochastic Flow Matching (MMSFM), a novel extension of simulation-free score and flow matching methods to the multi-marginal setting, enabling the alignment of high-dimensional data measured at non-equidistant time points without reducing dimensionality. The use of measure-valued splines enhances robustness to irregular snapshot timing, and score matching prevents overfitting in high-dimensional spaces. We validate our framework on several synthetic and benchmark datasets and apply it to single-cell perturbation data from melanoma cell lines and gene expression data collected at uneven time points.
Lay Summary: When studying how biological systems like cells respond to treatments, scientists often can only take snapshots at specific time points rather than continuously tracking individual cells. This is like having photos of a crowd at different times without knowing which person is which across photos—we can't see how individuals moved between snapshots.
We developed Multi-Marginal Stochastic Flow Matching (MMSFM) to reconstruct likely trajectories from snapshot data taken at irregular time intervals. Our method first uses optimal transport to find the most efficient way to match cells between consecutive snapshots, like pairing dancers between songs. Then, we use overlapping windows of three snapshots at a time to create smooth paths connecting these matched cells, similar to drawing multiple possible routes on a map. This overlapping approach makes our method robust to measurement timing and captures the inherent randomness in biological systems.
We successfully applied our method to track how melanoma cells respond to cancer drugs over time, revealing complex cellular dynamics that would otherwise remain hidden. This enables researchers to better understand drug resistance mechanisms and potentially design more effective treatments by observing cellular behavior between measurement points.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Primary Area: Applications->Health / Medicine
Keywords: Stochastic Flow Matching, Multi-Marginal Optimal Transport, Irregular Time Points, Snapshot Data, Simulation-Free Methods, Score Matching, Generative Models, Single-Cell Data
Submission Number: 12986
Loading