TL;DR: We study Variational Rectified Flow Matching, a framework that improves classic rectified flow matching by modeling multi-modal velocity vector fields, and demonstrate its compelling results on synthetic data, MNIST, CIFAR-10, and ImageNet.
Abstract: We study Variational Rectified Flow Matching, a framework that enhances classic rectified flow matching by modeling multi-modal velocity vector-fields. At inference time, classic rectified flow matching 'moves' samples from a source distribution to the target distribution by solving an ordinary differential equation via integration along a velocity vector-field. At training time, the velocity vector-field is learnt by linearly interpolating between coupled samples one drawn from the source and one drawn from the target distribution randomly. This leads to ''ground-truth'' velocity vector-fields that point in different directions at the same location, i.e., the velocity vector-fields are multi-modal/ambiguous. However, since training uses a standard mean-squared-error loss, the learnt velocity vector-field averages ''ground-truth'' directions and isn't multi-modal. In contrast, variational rectified flow matching learns and samples from multi-modal flow directions. We show on synthetic data, MNIST, CIFAR-10, and ImageNet that variational rectified flow matching leads to compelling results.
Lay Summary: (1) At the heart of recent advances in diffusion and flow matching methods lies a major assumption: Existing models assume a single deterministic velocity at each point in the data-time space. (2) With Variational Rectified Flow Matching (VRFM), we introduce a new framework that models the multi-modal distribution over velocity directions, unlocking a richer class of generative models. (3) VRFM is a shift in how we think about learning continuous dynamics. We demonstrate consistent, strong performance across datasets (Synthetic, MNIST, CIFAR-10, ImageNet) and architectures (UNet, DiT). This opens the door to more expressive, uncertainty-aware generative modeling.
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: Flow Matching, Diffusion Model, Generative Model
Submission Number: 1962
Loading