Context Preserving Autoregressive Frame Generation for Bounded Video

07 May 2025 (modified: 29 Oct 2025)Submitted to NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Generative model; diffusion model; video generation
Abstract: Recently, various video generation methods have been proposed, as diffusion models demonstrate their superior ability to generate high-quality videos. Specifically, autoregressive approaches have been suggested to enable the generation of videos of arbitrary length. However, the methods are not suitable for bounded video generation, as they generate open-ended videos. Moreover, recent methods for bounded video generation rely on flipping frames to satisfy the boundary constraint imposed by the ending frame. However, this approach contradicts the inherent bias of video models to generate frames in forward direction, limiting the generation capability. Accordingly, we propose a novel autoregressive approach for bounded video generation. Specifically, we introduce a context-aware bidirectional denoising method that progressively generates frames in both forward and backward directions while considering the frame context. Then, we propose a method to mitigate the context gap between the two directions, to ensure smooth and coherent transition between the sequences. Experimental results demonstrate the superiority of our approach over previous methods. Specifically, as our method aligns with the video model's forward generation bias, the output videos present more realistic motion dynamics. Moreover, our method outputs frames with enhanced visual quality by maintaining a consistent frame length for model input. More results can be found in our project page
Supplementary Material: zip
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 8464
Loading