Align-IQA: Aligning Image Quality Assessment Models with Diverse Human Preferences via Customizable Guidance
Abstract: The alignment of the image quality assessment (IQA) model with diverse human preferences remains a challenge, owing to the variability in preferences for different types of visual content, including user-generated and AI-generated content (AIGC), etc. Despite the significant success of existing IQA methods in assessing specific visual content by leveraging knowledge from pre-trained models, the intricate factors impacting final ratings and the specially designed network architecture of these methods result in gaps in their ability to accurately capture human preferences for novel visual content. To address this issue, we propose Align-IQA, a novel framework that aims to generate visual quality scores aligned with diverse human preferences for different types of visual content. Align-IQA contains two key designs: (1) A customizable quality-aware guidance injection module. By injecting specializable quality-aware prior knowledge into general-purpose pre-trained models, the proposed module guides the acquisition of quality-aware features and allows for different adjustments of features to be consistent with diverse human preferences for various types of visual content. (2) A multi-scale feature aggregation module. By simulating the multi-scale mechanism in the human visual system, the proposed module enables the extraction of a more comprehensive representation of quality-aware features from the human perception perspective. Extensive experimental results demonstrate that Align-IQA achieves comparable or better performance than SOTA methods. Notably, Align-IQA outperforms the previous best results on AIGC datasets, achieving PLCC of 0.890 (+3.73%) and 0.924 (+1.99%) on AGIQA-1K and AGIQA-3K. Additionally, Align-IQA reduces training parameters by 72.26% and inference overhead by 78.12% while maintaining SOTA performance.
Primary Subject Area: [Experience] Interactions and Quality of Experience
Secondary Subject Area: [Content] Vision and Language
Relevance To Conference: The contributions of this work advance the field of multimedia/multimodal processing by proposing a quality-aware Visual Prompt tuning and Multi-scale quality-aware feature aggregation for NR-IQA (VPMIQA), which can help multimedia applications provide high-quality images, thereby improving user needs and experience.
Supplementary Material: zip
Submission Number: 4432
Loading