Evaluating Newtonian Mechanics in Video Generative Models with Real Physical Systems

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: Morpheus, a physics-informed evaluation framework for assessing adherence to Newtonian Mechanics in Video Generative Models
Abstract: Recent advances in image and video generation raise hopes that these models possess world modeling capabilities—the ability to generate realistic, physically plausible videos. This could revolutionize applications in robotics, autonomous driving, and scientific simulation. However, before treating these models as world models, we must ask: Do they adhere to physical laws? Current evaluation methods rely on subjective judgments or trajectory matching, limiting their usage for physical reasoning estimation, where many generations could be physically plausible. Thus, we introduce **Morpheus**, one of the first physics-informed evaluation frameworks for measuring the ability of video generation models to comprehend Newtonian dynamics. **Morpheus** features 130 real-world videos capturing physical phenomena, guided by conservation laws. Using those as conditioning for video generation, we assess physical plausibility leveraging interpretable metrics evaluated with respect to infallible conservation laws known per physical setting, leveraging advances in physics-informed neural networks and vision-language foundation models. Importantly, **Morpheus** targets controlled Newtonian rigid-body settings to enable quantitative checks. Our findings reveal that even with advanced prompting and video conditioning, contemporary models struggle to encode physical principles despite generating aesthetically pleasing videos. Code and data available [here](https://github.com/physics-from-video/Morpheus).
Lay Summary: AI video generators such as Sora, Veo, and Kling now produce strikingly realistic clips, fueling hopes they could serve as "world models" that understand how the physical world works — for robotics and self-driving cars. But a convincing video is not always a physically correct one: a ball might bounce higher than it was dropped — breaking physics while still looking believable. Existing tests either use AI judges, trained to rate videos as a person would for physical common sense and prompt-following, or compare against one "correct" reference video — so they miss real physics errors or unfairly reject valid alternatives. We built **Morpheus**, a framework that checks whether generated videos truly obey physical laws instead of merely looking plausible. We filmed 130 real-world experiments of everyday physics — falling objects, projectiles, collisions, pendulums — and asked the models to continue these scenes. We then track the objects' motion automatically and test it against fundamental laws like conservation of energy and momentum — not against a single right answer. We find that even today's most impressive models routinely violate basic physics. **Morpheus** offers an objective way to measure this gap — an essential step before trusting them as real-world simulators.
Link To Code: https://github.com/physics-from-video/Morpheus
Primary Area: Deep Learning->Foundation Models
Keywords: Video Generation, Physics Evaluation
Originally Submitted PDF: pdf
Submission Number: 34353
Loading