The Evaluation Imperative for Video Generative Models: A Survey on Metrics, Benchmarks, and Trustworthiness

Kaveh Safavigerdini; Bijaya Kumar Hatuwal; Amirreza Daghighi; Kannappan Palaniappan

The Evaluation Imperative for Video Generative Models: A Survey on Metrics, Benchmarks, and Trustworthiness

Kaveh Safavigerdini, Bijaya Kumar Hatuwal, Amirreza Daghighi, Kannappan Palaniappan

Published: 24 Mar 2026, Last Modified: 24 Mar 2026CVPR 2026 Workshop VGBEEveryoneRevisionsBibTeXCC BY 4.0

Submission Type: Full Papers (up to 8 pages)

Keywords: Video Generative Models, Evaluation Benchmarks, Spatiotemporal Metrics, AI Trustworthiness

Abstract: The rapid evolution of video generative models has shifted the research focus from basic synthesis to high-fidelity, physically grounded content. However, traditional evaluation frameworks remain largely insufficient, often failing to capture the spatiotemporal consistency and semantic nuances required for verifiable quality. This survey provides a critical analysis of the AI-generated video evaluation (AIGVE) field, categorizing methodologies into metric-based, human-involved, and model-centered paradigms. We examine how architectural shifts primarily toward diffusion transformers and autoregressive frameworks have redefined evaluation requirements. We trace the evolution of metrics from frame-level heuristics to perceptually grounded spatiotemporal measures and analyze benchmarks spanning text-to-video, specialized conditional generation, and long-form storytelling. The review also addresses the critical dimensions of human preference alignment, physical world simulation, and safety-aware assessment, while exploring the impact of generative priors on downstream applications like video editing and compression. By identifying current limitations, we propose a path toward unified and trustworthy evaluation frameworks for the next generation of video foundation models.

Submission Number: 16

Loading