Abstract: Diffusion models have significantly advanced generative AI in terms of creating and editing natural images. However, improving the image quality of generated images is still of paramount interest. In this context, we propose a generic kurtosis concentration (KC) loss that can be readily applied to any standard diffusion model pipeline to improve image quality. Our motivation stems from the projected kurtosis concentration property of natural images, which states that natural images have nearly constant kurtosis values across different band-pass filtered versions of the image. To improve the image quality of generated images, we reduce the gap between the highest and lowest kurtosis values across the band-pass filtered versions (e.g., Discrete Wavelet Transform (DWT)) of images. In addition, we also propose a novel condition-agnostic perceptual guidance strategy during inference to further improve the quality. We validate the proposed approach on four diverse tasks, viz., (1) personalized few-shot finetuning using text guidance, (2) unconditional image generation, (3) image super-resolution, and (4) blind face-restoration. Integrating the proposed KC loss and perceptual guidance has improved the perceptual quality in all these tasks in terms of FID, MUSIQ score, and user evaluation. Code: https://github.com/aniket004/DiffNat.git
Submission Length: Regular submission (no more than 12 pages of main content)
Code: https://github.com/aniket004/DiffNat.git
Supplementary Material: zip
Assigned Action Editor: ~C.V._Jawahar1
Submission Number: 4570
Loading