Abstract: In this paper, we uncover the untapped potential of dif-fusion U-Net, which serves as a “free lunch” that substan-tially improves the generation quality on the fly. We initially investigate the key contributions of the U-Net architecture to the denoising process and identify that its main backbone primarily contributes to denoising, whereas its skip connections mainly introduce high-frequency features into the de-coder module, causing the potential neglect of crucial functions intrinsic to the backbone network. Capitalizing on this discovery, we propose a simple yet effective method, termed “FreeU”, which enhances generation quality without additional training or finetuning. Our key insight is to strategi-cally re-weight the contributions sourced from the U-Net's skip connections and backbone feature maps, to leverage the strengths of both components of the U-Net architec-ture. Promising results on image and video generation tasks demonstrate that our FreeU can be readily integrated to ex-isting diffusion models, e.g., Stable Diffusion, DreamBooth and ControlNet, to improve the generation quality with only a few lines of code. All you need is to adjust two scaling factors during inference.
Loading