Abstract: Flow map matching (FMM) enables one- and few-step sampling for diffusion-style generation, yet its performance is often hindered by the mismatch between ground-truth training transitions and model-induced flow maps.
We propose **Contrastive Flow Map Matching (CFMM)**, a principled framework that explicitly aligns FMM training with practical sampling.
Our approach is motivated by a joint-KL decomposition on the reverse KL divergence, which decomposes the distributional gap into a marginal mismatch over intermediate states and a conditional mismatch in endpoint reconstruction.
This analysis motivates two complementary objectives: average-velocity regression for marginal alignment and a sampling-aligned InfoNCE contrastive loss for conditional refinement.
CFMM is a training-only plug-in for pre-trained FMMs, incurs no inference-time overhead, and supports training FMMs from scratch.
Experiments on CIFAR-10, ImageNet, and LSUN across multiple FMM baselines demonstrate consistent improvements in fidelity and perceptual quality with only modest additional training cost.
Lay Summary: Modern image generation models can create high-quality images, but they often need many repeated steps to turn random noise into a realistic picture. Recent fast generation methods try to reduce this process to only one or a few steps, but this is difficult because small errors made during training can become larger when the model is actually used for sampling. In this work, we study this mismatch between how flow-map models are trained and how they generate images. We propose Contrastive Flow Map Matching, a training method that helps the model better follow its own generation path. Our method combines a standard objective that improves the movement from noisy states to cleaner states with a contrastive objective that encourages the generated image to stay semantically close to its paired training image. This means the model learns not only to move in the right direction, but also to preserve important visual content such as object identity, layout, and structure. The method is used only during training, so it does not make image generation slower at test time. Experiments on several image datasets and model families show that this approach improves one-step and few-step generation with only modest additional training cost.
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: Flow Map Matching, Contrastive Learning, Few-step Sampling
Originally Submitted PDF: pdf
Submission Number: 5420
Loading