Large Multi-modality Model Assisted AI-Generated Image Quality Assessment

Puyi Wang; Wei Sun; Zicheng Zhang; Jun Jia; Yanwei Jiang; Zhichao Zhang; Xiongkuo Min; Guangtao Zhai

Large Multi-modality Model Assisted AI-Generated Image Quality Assessment

Puyi Wang, Wei Sun, Zicheng Zhang, Jun Jia, Yanwei Jiang, Zhichao Zhang, Xiongkuo Min, Guangtao Zhai

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 OralEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Traditional deep neural network (DNN)-based image quality assessment (IQA) models leverage convolutional neural networks (CNN) or Transformer to learn the quality-aware feature representation, achieving commendable performance on natural scene images. However, when applied to AI-Generated images (AGIs), these DNN-based IQA models exhibit subpar performance. This situation is largely due to the semantic inaccuracies inherent in certain AGIs caused by uncontrollable nature of the generation process. Thus, the capability to discern semantic content becomes crucial for assessing the quality of AGIs. Traditional DNN-based IQA models, constrained by limited parameter complexity and training data, struggle to capture complex fine-grained semantic features, making it challenging to grasp the existence and coherence of semantic content of the entire image. To address the shortfall in semantic content perception of current IQA models, we introduce a large ***M***ulti-modality model ***A***ssisted ***A***I-***G***enerated ***I***mage ***Q***uality ***A***ssessment (***MA-AGIQA***) model, which utilizes semantically informed guidance to sense semantic information and extract semantic vectors through carefully designed text prompts. Moreover, it employs a mixture of experts (MoE) structure to dynamically integrate the semantic information with the quality-aware features extracted by traditional DNN-based IQA models. Comprehensive experiments conducted on two AI-generated content datasets, AIGCQA-20k and AGIQA-3k show that MA-AGIQA achieves state-of-the-art performance, and demonstrate its superior generalization capabilities on assessing the quality of AGIs. The code will be available.

Primary Subject Area: [Experience] Interactions and Quality of Experience

Relevance To Conference: Our research introduces a novel approach to enhance deep neural networks' understanding of AI-Generated images' semantic content to improve image quality assessment (IQA). By integrating multimodal large language models (MLLMs) with traditional networks, we achieve groundbreaking results on AI-Generated image datasets, and showcase outstanding generalization ability. This innovation contributes significantly to multimedia/multimodal processing fields by offering a method to better assess and understand AI-generated images quality, paving the way for broader application in multimedia tasks and inspiring further integration of MLLMs in AI-Generated content evaluation.

Submission Number: 4152

Loading