Wasserstein Flow Matching: Generative Modeling Over Families of Distributions

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-ND 4.0
TL;DR: Flow Matching between distributions over distributions on the Wasserstein space for generation of Gaussians & general distributions.
Abstract: Generative modeling typically concerns transporting a single source distribution to a target distribution via simple probability flows. However, in fields like computer graphics and single-cell genomics, samples themselves can be viewed as distributions, where standard flow matching ignores their inherent geometry. We propose Wasserstein flow matching (WFM), which lifts flow matching onto families of distributions using the Wasserstein geometry. Notably, WFM is the first algorithm capable of generating distributions in high dimensions, whether represented analytically (as Gaussians) or empirically (as point-clouds). Our theoretical analysis establishes that Wasserstein geodesics constitute proper conditional flows over the space of distributions, making for a valid FM objective. Our algorithm leverages optimal transport theory and the attention mechanism, demonstrating versatility across computational regimes: exploiting closed-form optimal transport paths for Gaussian families, while using entropic estimates on point-clouds for general distributions. WFM successfully generates both 2D \& 3D shapes and high-dimensional cellular microenvironments from spatial transcriptomics data. Code is available at [WassersteinFlowMatching](https://github.com/WassersteinFlowMatching/WassersteinFlowMatching/).
Lay Summary: Generative models seek to transform unstructured noise into coherent data. Contemporary methods, including Diffusion and Flow Matching, have demonstrated remarkable success in generating high-resolution images, realistic videos, and well-formed text. These approaches thus produce single examples from a data distribution formed by individual examples, and have not yet been applied to generate complex, multi-component data points such as 3D shapes or biological cell structures. Instead, these cases are themselves complex distributions with both internal geometry (i.e. relationships between the components) and external relationships (i.e. similarity of two sets of components). Our approach, Wasserstein Flow Matching (WFM), addresses this by directly learning how to transform these entire collections of points, or distributions, respecting both their internal geometries and their relationships in higher-order Wasserstein space. Whereas standard Flow Matching learns the transformation via straight lines from individual noise points to real data points, WFM is designed to imitate the most efficient pathways in the space of distributions, which are mathematically defined by the Wasserstein optimal transport maps. Applying this method, we have successfully generated complex 2D and 3D shapes, as well as detailed high-dimensional models of cellular environments from biological data.
Link To Code: doronHav/WassersteinFlowMatching
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: Optimal Transport, Flow Matching, Generative Modeling, Single Cell
Submission Number: 13287
Loading