Keywords: video generation, world models, evaluation benchmark
TL;DR: We measure the physical time scale of video motion.
Abstract: While recent generative video models have achieved remarkable visual realism and are increasingly explored as world models, true physical simulation requires mastering both space and time. Current models can produce visually smooth kinematics, yet they lack a reliable internal motion pulse to ground these motions in a consistent real-world time scale. This temporal ambiguity stems from the common practice of training on videos with widely different real-world speeds after forcing them into standardized frame rates. As a result, generated sequences often exhibit ambiguous, unstable, and uncontrollable physical motion speeds, a failure mode we term chronometric hallucination. To address this issue, we propose Visual Chronometer, a predictor that recovers the Physical Frames Per Second (PhyFPS) directly from the visual dynamics of an input video. Trained via controlled temporal resampling, our method estimates the true temporal scale implied by motion itself, bypassing unreliable metadata. To systematically quantify this problem, we establish two benchmarks, PhyFPS-Bench-Real and PhyFPS-Bench-Gen. Our evaluations reveal that state-of-the-art video generators suffer from substantial PhyFPS misalignment and temporal instability. Finally, we show that applying PhyFPS-based corrections significantly improves the human-perceived naturalness of AI-generated videos.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 6
Loading