Frequency-Forcing: From Scaling-as-Time to Soft Frequency Guidance
Keywords: flow generative models,frequency representation,unified understanding and generation
Abstract: Standard flow-matching models transport noise to data along a single synchronized time variable, even though natural images are most naturally generated from coarse low-frequency structure to fine high-frequency detail. Recent work offers two complementary ways to impose such order: K-Flow rewrites the flow coordinate as a frequency scaling variable, while Latent Forcing keeps the pixel path intact and lets an auxiliary semantic stream mature earlier. We propose \textbf{Frequency-Forcing}, a soft frequency-ordering mechanism that keeps the standard pixel flow but guides it with an earlier-maturing low-frequency stream. The auxiliary signal is derived from the data itself through a lightweight learnable wavelet packet transform, avoiding dependence on a large pretrained semantic encoder. On ImageNet-256, Frequency-Forcing improves substantially over pixel-only baselines, outperforms fixed-frequency variants, and composes naturally with a semantic stream for further gains.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 68
Loading