UniQA: Unified Vision-Language Pre-training for Image Quality and Aesthetic Assessment

Hantao Zhou; Longxiang Tang; Rui Yang; Guanyi Qin; Yan Zhang; Runze Hu; Xiu Li

UniQA: Unified Vision-Language Pre-training for Image Quality and Aesthetic Assessment

Hantao Zhou, Longxiang Tang, Rui Yang, Guanyi Qin, Yan Zhang, Runze Hu, Xiu Li

20 Sept 2024 (modified: 23 Jan 2025)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Image assessment, Vision-language learning, Multimodal large language models

Abstract: Image Quality Assessment (IQA) and Image Aesthetic Assessment (IAA) aim to simulate human subjective perception of image visual quality and aesthetic appeal. Despite distinct learning objectives, they have underlying interconnectedness due to consistent human assessment perception. Existing unified methods typically combine datasets of two tasks for regression training directly, which fail to learn mutually beneficial representations shared by both tasks explicitly. To confront this challenge, we propose \textbf{Uni}fied vision-language pre-training of \textbf{Q}uality and \textbf{A}esthetics (\textbf{UniQA}), to extract useful and common representations from two tasks, thereby benefiting them simultaneously. Unfortunately, the lack of text in the IQA datasets and the textual noise in the IAA datasets pose severe challenges for multimodal pre-training. To address this, we (1) utilize multimodal large language models (MLLMs) to generate high-quality text descriptions; (2) use the generated text for IAA as metadata to purify noisy IAA data. To effectively adapt the pre-trained UniQA to downstream tasks, we further propose a lightweight adapter that utilizes versatile cues to fully exploit the extensive knowledge of the pre-trained model. Extensive experiments show that our approach achieves state-of-the-art performance on both IQA and IAA tasks, while also demonstrating exceptional few-label image assessment capabilities.

Primary Area: applications to computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 2022

Loading