Impossible Videos

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC 4.0
TL;DR: This work present a comprehensive benchmark of impossible videos and text prompts, aiming to evaluate sota video generation and understanding models.
Abstract: Synthetic videos nowadays is widely used to complement data scarcity and diversity of real-world videos. Current synthetic datasets primarily replicate real-world scenarios, leaving impossible, counterfactual and anti-reality video concepts underexplored. This work aims to answer two questions: 1) Can today's video generation models effectively follow prompts to create impossible video content? 2) Are today's video understanding models good enough for understanding impossible videos? To this end, we introduce *IPV-Bench*, a novel benchmark designed to evaluate and foster progress in video understanding and generation. *IPV-Bench* is underpinned by a comprehensive taxonomy, encompassing 4 domains, 14 categories. It features diverse scenes that defy physical, biological, geographical, or social laws. Based on the taxonomy, a prompt suite is constructed to evaluate video generation models, challenging their prompt following and creativity capabilities. In addition, a video benchmark is curated to assess Video-LLMs on their ability of understanding impossible videos, which particularly requires reasoning on temporal dynamics and world knowledge. Comprehensive evaluations reveal limitations and insights for future directions of video models, paving the way for next-generation video models.
Lay Summary: We introduce a benchmark of "Impossible Videos" that defy physical or commonsense laws—like snow in the tropics or objects moving on their own. Current AI models struggle with these cases. Our work reveals their limitations and encourages the development of video models with stronger reasoning and world knowledge.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://showlab.github.io/Impossible-Videos/
Primary Area: Applications->Computer Vision
Keywords: Videos, Benchmark, Evaluation, Impossible Videos, Counterfactual, Anti-reality
Submission Number: 1560
Loading