The Evaluation Imperative for Video Generative Models: A Survey on Metrics, Benchmarks, and Trustworthiness
Submission Type: Full Papers (up to 8 pages)
Keywords: Video Generative Models, Evaluation Benchmarks, Spatiotemporal Metrics, AI Trustworthiness
Abstract: The rapid evolution of video generative models has shifted the research focus from basic synthesis to high-fidelity, physically grounded content. However, traditional evaluation frameworks remain largely insufficient, often failing to capture the spatiotemporal consistency and semantic nuances required for verifiable quality. This survey provides a critical analysis of the AI-generated video evaluation (AIGVE) field, categorizing methodologies into metric-based, human-involved, and model-centered paradigms. We examine how architectural shifts primarily toward diffusion transformers and autoregressive frameworks have redefined evaluation requirements. We trace the evolution of metrics from frame-level heuristics to perceptually grounded spatiotemporal measures and analyze benchmarks spanning text-to-video, specialized conditional generation, and long-form storytelling. The review also addresses the critical dimensions of human preference alignment, physical world simulation, and safety-aware assessment, while exploring the impact of generative priors on downstream applications like video editing and compression. By identifying current limitations, we propose a path toward unified and trustworthy evaluation frameworks for the next generation of video foundation models.
Submission Number: 16
Loading