PixelQA: Advancing Image and Pixel-Level Quality Assessment Using Large Multimodal Models

Published: 10 Nov 2024, Last Modified: 11 Apr 2025OpenReview Archive Direct UploadEveryoneCC BY 4.0
Abstract: With the rapid advancement of large multi-modality models (LMMs), LMM-based image quality assessment (IQA) methods have enhanced the capability to evaluate and explain the quality of visual content. However, existing methods primarily focus on assessing image quality at the overall image level, overlooking detailed local quality, which is essential for comprehensive visual understanding. In this work, we present PixelQA, the first framework designed to address both image-level and pixel-level visual quality perception by integrating LMMs with detailed quality analysis. We introduce a ground-truth-informed dataset construction approach, resulting in a dataset expansion to 420,000 samples that encompass both image and pixel-level IQA data. To better address resolution-related quality issues, our approach preserves image resolution during training and leverages multi-scale image features for pixel-level quality analysis. Experimental results demonstrate that PixelQA significantly surpasses other LMM-based IQA models. These advantages are further validated in real-world applications, including quality assessment and segmentation of real-world images.
Loading