TL;DR: We propose Perceptual Manifold Guidance in Latent Diffusion Models to extract multi-scale, multi-time denoising U-Net features for zero-shot No-Reference Image Quality Assessment, marking the first-ever application of pretrained LDMs to NR-IQA.
Abstract: Despite recent advancements in latent diffusion models that generate high-dimensional image data and perform various downstream tasks, there has been little exploration into perceptual consistency within these models on the task of No-Reference Image Quality Assessment (NR-IQA). In this paper, we hypothesize that latent diffusion models implicitly exhibit perceptually consistent local regions within the data manifold. We leverage this insight to guide on-manifold sampling using perceptual features and input measurements. Specifically, we propose Perceptual Manifold Guidance (PMG), an algorithm that utilizes pretrained latent diffusion models and perceptual quality features to obtain perceptually consistent multi-scale and multi-timestep feature maps from the denoising U-Net. We empirically demonstrate that these hyperfeatures exhibit high correlation with human perception in IQA tasks. Our method can be applied to any existing pretrained latent diffusion model and is straightforward to integrate. To the best of our knowledge, this paper is the first work on guiding diffusion model with perceptual features for NR-IQA. Extensive experiments on IQA datasets show that our method, LGDM, achieves state-of-the-art performance, underscoring the superior generalization capabilities of diffusion models for NR-IQA tasks.
Lay Summary: Images on the internet often vary in quality because of noise, blur, or compression artifacts. Humans can easily tell when an image looks good or bad without seeing an original reference, but automated tools struggle to match human judgments. We discovered that advanced image-generation models, i.e. latent diffusion models, learn to represent images in a way that aligns with human perception. By tapping into these hidden representations, we can extract detailed features that relate directly to how people rate image quality. We introduce a simple algorithm, Perceptual Manifold Guidance (PMG), which uses these features to predict image quality without any additional training. Our method works out of the box on existing models and achieves top performance on a variety of standard benchmarks. This approach offers a practical and accurate way to assess image quality in applications ranging from photography and video streaming to medical imaging and remote sensing.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: Perceptual Quality, Latent Diffusion Models, Training-free, Gen-AI, IQA
Submission Number: 7621
Loading