Keywords: flow matching, optimal transport, semidiscrete optimal transport
TL;DR: Propose a new method to train
Abstract: Flow models parameterized as time-dependent velocity fields can generate data from noise by integrating an ODE.
These models are often trained using flow matching, i.e. by sampling random pairs of noise and target points $(x_0,x_1)$ and ensuring that the velocity field is aligned, on average, with $x_1-x_0$ when evaluated along a time-indexed segment linking $x_0$ to $x_1$.
While these noise/data pairs are sampled independently by default, they can also be selected more carefully by matching batches of $n$ noise to $n$ target points using an optimal transport (OT) solver.
Although promising in theory, the OT flow matching (OT-FM) approach (Pooladian et al., 2023, Tong et al., 2024) is not widely used in practice.
Zhang et al. (2025), pointed out recently that OT-FM truly starts paying off when the batch size $n$ grows significantly, which only a multi-GPU implementation of the Sinkhorn algorithm can handle.
Unfortunately, the pre-compute costs of running Sinkhorn can quickly balloon, requiring $O(n^2/\varepsilon^2)$ operations for every $n$ pairs used to fit the velocity field, where $\varepsilon$ is a regularization parameter that should be typically small to yield better results.
To fulfill the theoretical promises of OT-FM, we propose to move away from batch-OT and rely instead on a semidiscrete formulation that can leverage the fact that the target dataset distribution is usually of finite size $N$. The SD-OT problem is solved by estimating a dual potential vector of size $N$ using SGD; using that vector, freshly sampled noise vectors at train time can then be matched with data points at the cost of a maximum inner product search (MIPS) over the dataset.
Semidiscrete FM (SD-FM) removes the quadratic dependency on $n/\varepsilon$ that bottlenecks OT-FM. SD-FM beats both FM and OT-FM on all training metrics and inference budget constraints, across multiple datasets, on unconditional/conditional generation, or when using mean-flow models.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 11298
Loading