VACT: A Video Automatic Causal Testing System and a Benchmark

Haotong Yang; Qingyuan Zheng; Yunjian Gao; Yongkun Yang; Yangbo He; Zhouchen Lin; Muhan Zhang

VACT: A Video Automatic Causal Testing System and a Benchmark

Haotong Yang, Qingyuan Zheng, Yunjian Gao, Yongkun Yang, Yangbo He, Zhouchen Lin, Muhan Zhang

Published: 06 Mar 2025, Last Modified: 01 May 2025SCSL @ ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Track: regular paper (up to 6 pages)

Keywords: video generation model, world simulator, model evaluation, causal inference, shortcut detection

TL;DR: We propose an automatic system to model causal system and test model causal behavior.

Abstract: With the rapid advancement of text-conditioned Video Generation Models (VGMs), the quality of generated videos has significantly improved, bringing these models closer to functioning as "world simulators'' and making real-world-level video generation more accessible and cost-effective. However, the generated videos often contain factual inaccuracies and lack understanding of fundamental physical laws. While some previous studies have highlighted this issue in limited domains through manual analysis, a comprehensive solution has not yet been established, primarily due to the absence of a generalized, automated approach for modeling and assessing the causal reasoning of these models across diverse scenarios. To address this gap, we propose an automated framework for modeling, evaluating, and measuring the causal understanding of VGMs in real-world scenarios. By combining causal analysis techniques with a carefully designed large language model assistant, our system can assess the causal behavior of models in various contexts without human annotation, which offers strong generalization and scalability. Additionally, we introduce multi-level causal evaluation metrics to provide a detailed analysis of the causal performance of VGMs. As a demonstration, we use our framework to benchmark several prevailing VGMs, offering insight into their causal reasoning capabilities. Our work lays the foundation for systematically addressing the causal understanding deficiencies in VGMs and contributes to advancing their reliability and real-world applicability.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 37

Loading