Physics-Guided Motion Loss for Video Generation Model

Bowen Xue; Giuseppe Claudio Guarnera; Shuang Zhao; Zahra Montazeri

Physics-Guided Motion Loss for Video Generation Model

Bowen Xue, Giuseppe Claudio Guarnera, Shuang Zhao, Zahra Montazeri

11 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Video generation, Diffusion model

Abstract: Current video diffusion models generate visually compelling content but often violate basic laws of physics, producing subtle artifacts like rubber-sheet deformations and inconsistent object motion. We introduce a frequency-domain physics prior that improves motion plausibility without modifying model architectures. Our method decomposes common rigid motions (translation, rotation, scaling) into lightweight spectral losses, requiring only 2.7% of frequency coefficients while preserving 97%+ of spectral energy. Applied to Open-Sora, MVDIT, and Hunyuan, our approach improves both motion accuracy and action recognition by ~11\% on average on OpenVID-1M (relative), while maintaining visual quality. User studies show 74--83% preference for our physics-enhanced videos. It also reduces warping error by 22--37% (depending on the backbone) and improves temporal consistency scores. These results indicate that simple, global spectral cues are an effective drop-in regularizer for physically plausible motion in video diffusion.

Supplementary Material: zip

Primary Area: generative models

Submission Number: 4165

Loading