Q-Ground: Image Quality Grounding with Large Multi-modality Models

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 OralEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Recent advances of large multi-modality models (LMM) have greatly improved the ability of image quality assessment (IQA) method to evaluate and explain the quality of visual content. However, these advancements are mostly focused on overall quality assessment, and the detailed examination of local quality, which is crucial for comprehensive visual understanding, is still largely unexplored. In this work, we introduce **Q-Ground**, the first framework aimed at tackling fine-scale visual quality grounding by combining large multi-modality models with detailed visual quality analysis. Central to our contribution is the introduction of the **QGround-100K** dataset, a novel resource containing 100k triplets of *(image, quality text, distortion segmentation)* to facilitate deep investigations into visual quality. The dataset comprises two parts: one with human-labeled annotations for accurate quality assessment, and another labeled automatically by LMMs such as GPT4V, which helps improve the robustness of model training while also reducing the costs of data collection. With the **QGround-100K** dataset, we propose a LMM-based method equipped with multi-scale feature learning to learn models capable of performing both image quality answering and distortion segmentation based on text prompts. This dual-capability approach not only refines the model's understanding of region-aware image quality but also enables it to interactively respond to complex, text-based queries about image quality and specific distortions. **Q-Ground** takes a step towards sophisticated visual quality analysis in a finer scale, establishing a new benchmark for future research in the area. Codes and dataset will be made available.
Primary Subject Area: [Experience] Multimedia Applications
Secondary Subject Area: [Experience] Interactions and Quality of Experience
Relevance To Conference: Image Quality Assessment (IQA) holds pivotal importance to the ACM Multimedia conference, a leading forum for discussing innovations in multimedia technologies. IQA is integral to numerous facets of multimedia, including content analysis, compression, transmission, and human-computer interaction. It ensures that multimedia content meets the requisite standards for a high-quality user experience, enabling effective feature extraction for content recognition and efficient compression without significant quality degradation. Furthermore, in areas such as augmented and virtual reality, IQA is crucial for delivering immersive experiences. The conference also explores the role of IQA in emerging technologies like machine learning and AI, where the quality of images significantly influences algorithm performance. By fostering advancements in IQA, the ACM Multimedia conference contributes to enhancing multimedia applications across various domains, underscoring the critical role of high-quality multimedia content in the digital age.
Supplementary Material: zip
Submission Number: 1091
Loading