Visual Generation Without Guidance

Huayu Chen; Kai Jiang; Kaiwen Zheng; Jianfei Chen; Hang Su; Jun Zhu

Visual Generation Without Guidance

Huayu Chen, Kai Jiang, Kaiwen Zheng, Jianfei Chen, Hang Su, Jun Zhu

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: GFT proposes an efficient alternative to guided sampling, reducing the computational cost of CFG by half while maintaining comparable performance.

Abstract: Classifier-Free Guidance (CFG) has been a default technique in various visual generative models, yet it requires inference from both conditional and unconditional models during sampling. We propose to build visual models that are free from guided sampling. The resulting algorithm, Guidance-Free Training (GFT), matches the performance of CFG while reducing sampling to a single model, halving the computational cost. Unlike previous distillation-based approaches that rely on pretrained CFG networks, GFT enables training directly from scratch. GFT is simple to implement. It retains the same maximum likelihood objective as CFG and differs mainly in the parameterization of conditional models. Implementing GFT requires only minimal modifications to existing codebases, as most design choices and hyperparameters are directly inherited from CFG. Our extensive experiments across five distinct visual models demonstrate the effectiveness and versatility of GFT. Across domains of diffusion, autoregressive, and masked-prediction modeling, GFT consistently achieves comparable or even lower FID scores, with similar diversity-fidelity trade-offs compared with CFG baselines, all while being guidance-free.

Lay Summary: Generative AI models can create realistic images, but they often rely on a technique called "Classifier-Free Guidance" (CFG), which uses two versions of the model during image generation. This makes the process more expensive and slower. In our work, we introduce a new method called Guidance-Free Training (GFT) that removes the need for this extra guidance. It’s easy to use, works with a variety of generative techniques, and cuts the computing cost in half. Our experiments show that GFT performs just as well, or even better, than CFG on many tasks, making it a more efficient and practical choice for building powerful AI image generators.

Primary Area: Deep Learning->Generative Models and Autoencoders

Keywords: Visual Generative Modeling, Guidance, Guided Sampling, CFG, Efficiency, Distillation

Submission Number: 5783

Loading