GQA: Generation Quality Assessment of AIGC Videos based on Human Assessment: Dataset, Scoring and Explanation

ICLR 2026 Conference Submission19008 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AIGC, Video Generation Quality Assessment, Dataset
Abstract: Recent advances have significantly elevated the quality of AI-generated videos; however, existing evaluation metrics still struggle to align closely with human perceptual judgments. While prior work has repurposed deep learning models or borrowed algorithms from other domains to assess generative content, their outputs often exhibit noticeable discrepancies with real human evaluations. To address this critical gap, we introduce the GQA dataset — a human-aligned benchmark comprising: (1) videos generated by dozens of state-of-the-art models, including those from the VAE and Diffusion Model (DM) families; (2) dozens of refined evaluation metrics systematically categorized into three core dimensions — Video-Text Consistency, Realism, and Traditional Quality; and (3) a prompt-adaptive metric selection mechanism that ensures evaluations are contextually relevant, avoiding misaligned assessments across semantically unrelated dimensions. GQA enables more accurate, interpretable, and perception-aware evaluation of AI-generated video content.
Supplementary Material: zip
Primary Area: datasets and benchmarks
Submission Number: 19008
Loading