Analyzing and Mitigating Model Collapse in Reflow Methods

Published: 22 Jan 2026, Last Modified: 06 Mar 2026CPAL 2026 (Proceedings Track) OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Model Collapse, Self-training, Synthetic Data, Reflow, Rectified Flow
TL;DR: Repeated Reflow can collapse under purely synthetic recursion; a consistent fraction of real data stabilizes it, as predicted by a DAE theory and supported by image experiments.
Abstract: Generative models increasingly encounter synthetic data produced by earlier model snapshots, either unintentionally through data contamination or deliberately through self-training procedures such as Reflow. In rectified flow and related diffusion/flow systems, Reflow retrains on model-generated samples to straighten trajectories and accelerate sampling, but repeated self-training can degrade sample quality and diversity. We provide a mechanistic analysis of this failure mode and a principled mitigation strategy. Using a linear denoising autoencoder (DAE) as a tractable surrogate for Reflow-style recursion, we show that under purely synthetic recursive training the end-to-end linear map contracts: its operator norm decays to zero at a geometric rate, reflecting a progressive loss of representational power. We further prove that augmenting each Reflow round with a fixed fraction of real data prevents this degeneration by keeping the operator norm bounded away from zero. Finally, we validate that the qualitative trends implied by the theory are observable in practical Reflow pipelines on toy settings and image benchmarks, and we show that simple real-data–augmented Reflow schemes preserve Reflow's sampling-speed benefits while maintaining image quality.
Submission Number: 110
Loading