Abstract: Video outpainting aims to extend the content of a video beyond its original spatial boundaries. Existing methods tend to condition the generation process on a single frame or caption, failing to address the challenge in long videos with multiple video clips. To address this, we extend the diffusion-based image inpainting to a 3D diffusion model with a motion module for video outpainting. Then, we design four conditioning schemes to seek the appropriate condition strategy: Frame-wise context, Frame-wise Adapter, Text, and Free Condition. Additionally, to enhance the continuity in long video generation, we propose a novel Motion Momentum Update (MMU) method based on a training-free technique. Experimental results demonstrate that our proposed Free Condition strategy, termed Free-Outpainter, achieves state-of-the-art performance in video outpainting tasks. Ablation studies also show that the condition-free training paradigm is the most effective in avoiding inaccurate external information and fully utilizing inherent video data. Benefiting from the Free Condition strategy, our pipeline reduces the inference time to 5 seconds for 512x512x16 frames with only 10 GB of GPU memory. More results can be found on our demo page https://lilyn3125.github.io/FreeOutpainting/.
Loading