QPT-V2: Masked Image Modeling Advances Visual Scoring

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Quality assessment and aesthetics assessment aim to evaluate the perceived quality and aesthetics of visual content. Current learning-based methods suffer greatly from the scarcity of labeled data and usually perform sub-optimally in terms of generalization. Although masked image modeling (MIM) has achieved noteworthy advancements across various high-level tasks (i.e., classification, detection etc.). In this work, we take on a novel perspective to investigate its capabilities in terms of quality- and aesthetics-awareness. To this end, we propose Quality- and aesthectics-aware PreTraining (QPT V2), the first pretraining framework based on MIM that offers a unified solution for quality and asthectics assessment. Specifically, QPT V2 incporporates following key designs: To perceive the high-level semantics and fine-grained details, pretraining data is curated. To comprehensively encompass quality- and aesthetics-related factors, degradation is introduced. To capture multi-scale quality and aesthetics information, model structure is modified. Extensive experimental results on 11 downstream benchmarks clearly show the superior performance of QPT V2 in comparison with current state-of-the-art approaches and other pretraining paradigms.
Primary Subject Area: [Experience] Interactions and Quality of Experience
Relevance To Conference: With the emergence of User-Generated Content (UGC) and Artificial Intelligence-Generated Content (AIGC), the factors related to quality and aesthetics in visual content have become increasingly complex, proposes greater challenge for Image Quality Assessment (VQA), Video Quality Assessment (VQA) and Image Aesthetic Assessment (VQA) to enhance users' Quality of Experience (QoE). Current learning-based IQA, VQA, and IAA methods suffer greatly from the scarcity of labeled data and usually perform sub-optimally in terms of generalization. Although masked image modeling (MIM) has achieved noteworthy advancements across various high-level tasks (i.e., classification, detection etc.). In this work, we take on a novel perspective to investigate its capabilities in terms of quality- and aesthetics-awareness. To this end, we propose Quality- and aesthectics-aware PreTraining (QPT V2), the first pretraining framework based on MIM that offers a unified solution for quality and asthectics assessment. Specifically, QPT V2 incporporates following key designs: To perceive the high-level semantics and fine-grained details, pretraining data is curated. To comprehensively encompass quality- and aesthetics-related factors, degradation is introduced. To capture multi-scale quality and aesthetics information, model structure is modified. Extensive experimental results on 11 downstream benchmarks clearly show the superior performance of QPT V2 in comparison with current state-of-the-art approaches and other pretraining paradigms.
Submission Number: 1649
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview