DIFFNAT: IMPROVING DIFFUSION IMAGE QUALITY USING NATURAL IMAGE STATISTICS

Aniket Roy; Maitreya Suin; Anshul Shah; Ketul Shah; Jiang Liu; Rama Chellappa

DIFFNAT: IMPROVING DIFFUSION IMAGE QUALITY USING NATURAL IMAGE STATISTICS

Aniket Roy, Maitreya Suin, Anshul Shah, Ketul Shah, Jiang Liu, Rama Chellappa

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: diffusion model, natural image statistics

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: Diffusion models have advanced generative AI significantly in terms of editing and creating naturalistic images. However, while editing images using text-prompt or image guidance, some unnatural artefacts or effects can be generated by the diffusion model. This problem is more prominent in the context of few-shot personalization of text-to-image diffusion model, where the large diffusion model has to be finetuned from few examples of certain subject identity to produce edited images conditioned on text prompts. In this context, we propose a generic “naturalness” preserving loss function, viz., kurtosis concentration (KC) loss, which can be readily applied to any standard diffusion model pipeline to elevate the image quality. Our motivation stems from the projected kurtosis concentration property of natural images, which states that natural images have nearly constant kurtosis values across different band-pass versions of the image. In order to retain the “naturalness” of the generated images, we enforce reducing the gap between the highest and lowest kurtosis values across the band-pass versions (e.g., Discrete Wavelet Transform (DWT)) of images. Note that our approach does not require any additional guidance like classifer or classifer-free guidance in order to improve the image quality. We validate the proposed approach for three diverse tasks, viz., (1) personalized few-shot finetuning using text guidance, (2) unconditional image generation, and (3) image super-resolution. Integrating the proposed KC loss have improved the perceptual quality across all these tasks in terms of both FID, MUSIQ score and user evaluation.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7106

Loading