Gaussian Mixture Flow Matching Models

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: GMFlow generalizes diffusion models by predicting Gaussian mixture denoising distributions, enabling precise few-step sampling and high-quality generation.
Abstract: Diffusion models approximate the denoising distribution as a Gaussian and predict its mean, whereas flow matching models reparameterize the Gaussian mean as flow velocity. However, they underperform in few-step sampling due to discretization error and tend to produce over-saturated colors under classifier-free guidance (CFG). To address these limitations, we propose a novel Gaussian mixture flow matching (GMFlow) model: instead of predicting the mean, GMFlow predicts dynamic Gaussian mixture (GM) parameters to capture a multi-modal flow velocity distribution, which can be learned with a KL divergence loss. We demonstrate that GMFlow generalizes previous diffusion and flow matching models where a single Gaussian is learned with an $L_2$ denoising loss. For inference, we derive GM-SDE/ODE solvers that leverage analytic denoising distributions and velocity fields for precise few-step sampling. Furthermore, we introduce a novel probabilistic guidance scheme that mitigates the over-saturation issues of CFG and improves image generation quality. Extensive experiments demonstrate that GMFlow consistently outperforms flow matching baselines in generation quality, achieving a Precision of 0.942 with only 6 sampling steps on ImageNet 256$\times$256.
Lay Summary: Image generation models like diffusion and flow matching have revolutionized digital content creation but still face challenges. They often need many computational steps to generate high-quality images and tend to produce overly vivid, unrealistic colors when guided to follow specific styles or prompts. To address these issues, we developed Gaussian Mixture Flow Matching (GMFlow). Unlike previous methods that predict only one possible outcome for each step of image creation, GMFlow predicts multiple possible outcomes simultaneously, capturing a richer set of variations through something called a Gaussian mixture. We then designed specialized algorithms to efficiently generate high-quality images with fewer steps and less computational effort. Additionally, we introduced a probabilistic approach to better control image styles, reducing unrealistic colors. Our method significantly improves image generation quality, producing clearer, more realistic images faster. This advancement means generating high-quality visuals becomes quicker and more reliable, benefiting applications ranging from digital art to realistic virtual environments.
Link To Code: https://github.com/Lakonik/GMFlow
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: GMFlow, diffusion models, flow matching, Gaussian mixture
Submission Number: 847
Loading