Attribute-Driven Multimodal Hierarchical Prompts for Image Aesthetic Quality Assessment

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Image Aesthetic Quality Assessment (IAQA) aims to simulate users' visual perception to judge the aesthetic quality of images. In social media, users' aesthetic experiences are often reflected in their textual comments regarding the aesthetic attributes of images. To fully explore the attribute information perceived by users for evaluating image aesthetic quality, this paper proposes an image aesthetic quality assessment method based on attribute-driven multimodal hierarchical prompts. Unlike existing IAQA methods that utilize multimodal pre-training or straightforward prompts for model learning, the proposed method leverages attribute comments and quality-level text templates to hierarchically learn the aesthetic attributes and quality of images. Specifically, we first leverage users' aesthetic attribute comments to perform prompt learning on images. The learned attribute-driven multimodal features can comprehensively capture the semantic information of image aesthetic attributes perceived by users. Then, we construct text templates for different aesthetic quality levels to further facilitate prompt learning through semantic information related to the aesthetic quality of images. The proposed method can explicitly simulate users' aesthetic judgment of images to obtain more precise aesthetic quality. Experimental results demonstrate that the proposed IAQA method based on hierarchical prompts outperforms existing methods significantly on multiple IAQA databases. Our source code is provided in the supplementary material, and we will release all source code along with this paper.
Primary Subject Area: [Experience] Interactions and Quality of Experience
Relevance To Conference: This paper proposes a novel image aesthetic quality assessment (IAQA) method based on attribute-driven multimodal hierarchical prompts, which is extremely valuable in promoting the development of image processing and multimedia quality of experience, e.g., image recommendation, photo retrieval, and image enhancement. Different from existing IAQA methods, the proposed multimodal hierarchical prompts method can leverage the joint text prompts of aesthetic attributes and different quality levels to explicitly simulate the process of social media users' judgments on image aesthetic quality, resulting in a more efficient IAQA model in learning users' quality experience of image aesthetics. Extensive experimental results demonstrate that the proposed IAQA method achieves very promising performance, providing a novel insight into measuring the image aesthetic quality of users' experience through multimodal prompt learning.
Supplementary Material: zip
Submission Number: 2873
Loading