Keywords: Generative models, Flow model, Optimal transport, Curvature, Ordinary Differential Equation (ODE)
TL;DR: A rectified flow-based generative model that improves generation quality and straightens ODE solution trajectories by incorporating real images via slerp into the reflow training process.
Abstract: Rectified flow is a generative model that learns smooth transport mappings between two distributions through an ordinary differential equation (ODE). The model learns a straight ODE by reflow steps which iteratively update the supervisory flow. It allows for a relatively simple and efficient generation of high-quality images. However, rectified flow still faces several challenges. 1) The reflow process is slow because it requires a large number of generated pairs to model the target distribution. 2) It is well known that the use of suboptimal fake samples in reflow can lead to performance degradation of the learned flow model. This issue is further exacerbated by error accumulation across reflow steps and model collapse in denoising autoencoder models caused by self-consuming training.
In this work, we go one step further and empirically demonstrate that the reflow process causes the learned model to drift away from the target distribution, which in turn leads to a growing discrepancy in reconstruction error between fake and real images. We reveal the drift problem and design a new reflow step, namely the conic reflow. It supervises the model by the inversions of real data points through the previously learned model and its interpolation with random initial points. Our conic reflow leads to multiple advantages. 1) It keeps the ODE paths toward real samples, evaluated by reconstruction. 2) We use only a small number of generated samples instead of large generated samples, 600K and 4M, respectively. 3) The learned model generates images with higher quality evaluated by FID, IS, and Recall. 4) The learned flow is more straight than others, evaluated by curvature. We achieve much lower FID in both one-step and full-step generation in CIFAR-10. The conic reflow generalizes to various datasets such as LSUN Bedroom and ImageNet.
Supplementary Material: zip
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 4067
Loading