Abstract: Video style transfer, which aims to transfer a source video into another video with a different appearance while preserving its original structure, plays an important role in the video production industry. Existing methods often edit the first frame with an image editing tool, and feed it into an image-to-video generation model with source video guidance to generate the edited video. Although such a paradigm enables users to perform creative video editing with powerful image editing tools, it relies heavily on the native propagation capability of the video generation model, which can be limited by having only the first frame as appearance guidance. As a result, the edited video suffers from appearance drifting and structure distortion, leading to severe inconsistencies as time goes on. To this end, we propose EditProp, a novel video style transfer framework with two propagation stages: i) In the Keyframe Propagation stage, the edit in the first keyframe is faithfully propagated to other keyframes with an image-based in-context generation model, producing high-quality edited keyframes with strong appearance consistency. ii) Then, in the subsequent Video Propagation stage, the source video structure and the propagated keyframes are injected into the video generation model as control signals, providing sufficient appearance and structure guidance to generate the translated video. Experimental results demonstrate that our EditProp enables effective transfer to various styles, achieving superior editing results with strong appearance and structure consistency. Furthermore, thanks to our versatile keyframe-based propagation, our framework also enables extra applications such as smooth video style transition and long video style transfer.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Zhixiang_Wang1
Submission Number: 7829
Loading