EffiVMT: Video Motion Transfer via Efficient Spatial-Temporal Decoupled Finetuning

Published: 26 Jan 2026, Last Modified: 28 Feb 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Video diffusion transfer; Video motion transfer; Efficiency;
TL;DR: A two-stage video motion transfer framework that tuning the powerful video diffusion transformer to synthesize video clips with complex motion
Abstract: Recently, breakthroughs in the video diffusion transformer have shown remarkable capabilities in diverse motion generations. As for the motion-transfer task, current methods mainly use two-stage Low-Rank Adaptations (LoRAs) finetuning to obtain better performance. However, existing adaptation-based motion transfer still suffers from **motion inconsistency** and **tuning inefficiency** when applied to large video diffusion transformers. Naive two-stage LoRA tuning struggles to maintain motion consistency between generated and input videos due to the inherent spatial-temporal coupling in the 3D attention operator. In addition, they require time-consuming fine-tuning processes in both stages. To tackle these issues, we propose EffiVMT, an efficient **three-stage** video motion transfer framework that finetunes a powerful video diffusion transformer to synthesize complex motion. In **stage 1**, we propose a spatial-temporal head classification technique to decouple the heads of 3D attention to distinct groups for spatial-appearance and temporal motion processing. We then finetune the spatial heads in the **stage 2**. In the **stage 3** of temporal head tuning, we design the sparse motion sampling and adaptive RoPE to accelerate the tuning speed. To address the lack of a benchmark for this field, we introduce MotionBench, a comprehensive benchmark comprising diverse motion, including creative camera motion, single object motion, multiple object motion, and complex human motion. We show extensive evaluations on MotionBench to verify the superiority of EffiVMT.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 11725
Loading