Keywords: Data attribution, Video generation
TL;DR: Our method, MOTIVE, is a scalable, motion-centric data attribution framework for video generative models.
Abstract: Despite the rapid progress of video generative models, the role of data in shaping motion quality is poorly understood. We present MOTIVE (MOtion Training Influence for Video gEneration), a motion-centric, gradient-based data attribution framework that scales to modern, large, high-quality video datasets and models. We use this to study which finetuning clips improve or degrade temporal dynamics. MOTIVE isolates temporal dynamics from static appearance via flow-weighted loss masks, yielding scalable influence scores practical for modern, large, and high-quality datasets and models. On text-to-video models, MOTIVE identifies clips that strongly affect motion and guides data curation that improves temporal consistency and physical plausibility. With MOTIVE selected high-influence data, our method improves both motion smoothness and dynamic degree on VBench, achieving a 76.7% human preference win rate compared with the pretrained base model. To our knowledge, this is the first framework that attributes motion (not just appearance) in video generative models and uses it to curate finetuning data.
Supplementary Material: zip
Submission Number: 17
Loading