Spectral Motion Alignment for Video Motion Transfer Using Diffusion Models

Published: 01 Jan 2025, Last Modified: 02 Aug 2025AAAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Diffusion models have significantly facilitated the customization of input video with target appearance while maintaining its motion patterns. To distill the motion information from video frames, existing works often estimate motion representations as frame difference or correlation in pixel-/feature-space. Despite its simplicity, these methods have unexplored limitations, including lack of understanding of global motion context, and the introduction of motion-independent spatial distortions. To address this, we present Spectral Motion Alignment (SMA), a novel framework that refines and aligns motion representations in the spectral domain. Specifically, SMA learns spectral motion representations, facilitating the learning of whole-frame global motion dynamics, and effectively mitigating motion-independent artifacts. Extensive experiments demonstrate SMA's efficacy in improving motion transfer while maintaining computational efficiency and compatibility across various video customization frameworks.
Loading