Keywords: world models, temporal reversal asymmetry, intuitive physics, video foundation models, physics benchmark
TL;DR: We introduce Temporal Reversal Asymmetry (TRA), comparing forward versus reversed video prediction loss, and show only V-JEPA2 exhibits appropriate asymmetry among four models, suggesting latent prediction captures temporal causality better.
Abstract: World models that understand physics should recognize the thermodynamic arrow of time: some processes look physically plausible when reversed, while others do not. We propose Temporal Reversal Asymmetry (TRA), comparing prediction loss on forward versus time-reversed videos. We evaluated four video models on 380 physics simulations and found that TRA for V-JEPA2 varies continuously with dissipation strength, peaking at intermediate values (restitution 0.5, damping 2.0) where energy loss remains visible throughout the video, while approaching zero for both genuinely reversible processes and motion that stops too quickly. This non-monotonic relationship suggests V-JEPA2 has learned to recognize irreversible processes by how they look as opposed to memorizing categories. In contrast, VideoMAE V2 shows inverted TRA (negative asymmetry), while MVD and Hiera show near-zero TRA regardless of physics type, suggesting that latent prediction captures temporal causality more faithfully than pixel reconstruction or distillation. This work offers a training-free, physics-grounded probe that complements existing world model benchmarks.
Submission Number: 91
Loading