GQA: Generation Quality Assessment of AIGC Videos based on Human Assessment: Dataset, Scoring and Explanation

Longteng Jiang; DanDan Zheng; Heng Huang; Huaye Wang; Song XIAO; Jingdong Chen; JUN ZHOU; Xin Jin

GQA: Generation Quality Assessment of AIGC Videos based on Human Assessment: Dataset, Scoring and Explanation

Longteng Jiang, DanDan Zheng, Heng Huang, Huaye Wang, Song XIAO, Jingdong Chen, JUN ZHOU, Xin Jin

19 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: AIGC, Video Generation Quality Assessment, Dataset

Abstract: Recent advances have significantly elevated the quality of AI-generated videos; however, existing evaluation metrics still struggle to align closely with human perceptual judgments. While prior work has repurposed deep learning models or borrowed algorithms from other domains to assess generative content, their outputs often exhibit noticeable discrepancies with real human evaluations. To address this critical gap, we introduce the GQA dataset — a human-aligned benchmark comprising: (1) videos generated by dozens of state-of-the-art models, including those from the VAE and Diffusion Model (DM) families; (2) dozens of refined evaluation metrics systematically categorized into three core dimensions — Video-Text Consistency, Realism, and Traditional Quality; and (3) a prompt-adaptive metric selection mechanism that ensures evaluations are contextually relevant, avoiding misaligned assessments across semantically unrelated dimensions. GQA enables more accurate, interpretable, and perception-aware evaluation of AI-generated video content.

Supplementary Material: zip

Primary Area: datasets and benchmarks

Submission Number: 19008

Loading