Image and Video Quality Assessment using Prompt-Guided Latent Diffusion Models for Cross-Dataset Generalization
Abstract: The design of image and video quality assessment (QA) algorithms is extremely important
to benchmark and calibrate user experience in modern visual systems. A major drawback
of the state-of-the-art QA methods is their limited ability to generalize across diverse image
and video datasets with reasonable distribution shifts. In this work, we leverage the
denoising process of diffusion models for generalized image QA (IQA) and video QA (VQA)
by understanding the degree of alignment between learnable quality-aware text prompts
and images or video frames. In particular, we learn cross-attention maps from intermediate
layers of the denoiser of latent diffusion models (LDMs) to capture quality-aware representations
of images or video frames. Since applying text-to-image LDMs for every video frame
is computationally expensive for videos, we only estimate the quality of a frame-rate subsampled
version of the original video. To compensate for the loss in motion information due
to frame-rate sub-sampling, we propose a novel temporal quality modulator. Our extensive
cross-database experiments across various user-generated, synthetic, low-light, frame-rate
variation, ultra high definition, and streaming content-based databases show that our model
can achieve superior generalization in both IQA and VQA.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: We thank the action editor and reviewers for detailed comments on our paper. We make the following
updates to the revised manuscript:
1. Made claims regarding cross-dataset generalization more clear, and also changed the title to reflect this.
2. Provided a comparison of GenzIQA with two base variants of Stable diffusion models viz. v-1.5 and v-2 in Sec. 4.4.7 and Tab. 7.
3. Provided a precise explanation for the cause of generalization in GenzIQA and GenzVQA in the introduction.
4. ncorporated all reviewers’ comments and suggestions. In this revision, we include the latest feedback of Reviewer EQgr in Sec. 3.2 and 4.3.
Assigned Action Editor: ~Yu-Xiong_Wang1
Submission Number: 4964
Loading