Flow Matching with Semidiscrete Couplings

Flow Matching with Semidiscrete Couplings

ICLR 2026 Conference Submission11298 Authors

18 Sept 2025 (modified: 04 Dec 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: flow matching, optimal transport, semidiscrete optimal transport

TL;DR: Propose a new method to train

Abstract: Flow models parameterized as time-dependent velocity fields can generate data from noise by integrating an ODE. These models are often trained using flow matching, i.e. by sampling random pairs of noise and target points $(x_0,x_1)$ and ensuring that the velocity field is aligned, on average, with $x_1-x_0$ when evaluated along a time-indexed segment linking $x_0$ to $x_1$. While these noise/data pairs are sampled independently by default, they can also be selected more carefully by matching batches of $n$ noise to $n$ target points using an optimal transport (OT) solver. Although promising in theory, the OT flow matching (OT-FM) approach (Pooladian et al., 2023, Tong et al., 2024) is not widely used in practice. Zhang et al. (2025), pointed out recently that OT-FM truly starts paying off when the batch size $n$ grows significantly, which only a multi-GPU implementation of the Sinkhorn algorithm can handle. Unfortunately, the pre-compute costs of running Sinkhorn can quickly balloon, requiring $O(n^2/\varepsilon^2)$ operations for every $n$ pairs used to fit the velocity field, where $\varepsilon$ is a regularization parameter that should be typically small to yield better results. To fulfill the theoretical promises of OT-FM, we propose to move away from batch-OT and rely instead on a semidiscrete formulation that can leverage the fact that the target dataset distribution is usually of finite size $N$. The SD-OT problem is solved by estimating a dual potential vector of size $N$ using SGD; using that vector, freshly sampled noise vectors at train time can then be matched with data points at the cost of a maximum inner product search (MIPS) over the dataset. Semidiscrete FM (SD-FM) removes the quadratic dependency on $n/\varepsilon$ that bottlenecks OT-FM. SD-FM beats both FM and OT-FM on all training metrics and inference budget constraints, across multiple datasets, on unconditional/conditional generation, or when using mean-flow models.

Supplementary Material: zip

Primary Area: generative models

Submission Number: 11298

Loading