MIGA: Make Train-Free Infinite Frame Generation Great Again for Consistent Long Videos

Xiaokun Feng; Jiashu Zhu; Meiqi Wu; Chubin Chen; Fangyuan Mao; Haiyang Guo; Jiahong Wu; Xiangxiang Chu; Kaiqi Huang

MIGA: Make Train-Free Infinite Frame Generation Great Again for Consistent Long Videos

Xiaokun Feng, Jiashu Zhu, Meiqi Wu, Chubin Chen, Fangyuan Mao, Haiyang Guo, Jiahong Wu, Xiangxiang Chu, Kaiqi Huang

04 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: long video generation

Abstract: Without relying on significant computational or data resources, train-free long video generation aims to extend the duration of foundation video generation models, which are typically limited to short videos. Direct noise prediction on the entire long latents incurs substantial computational overhead. In contrast, frame-level autoregressive frameworks, e.g., FIFO-diffusion, offer the advantage of generating infinitely long videos with constant memory consumption. However, the substantial gap between training and inference phases hinders the effective utilization of foundation models. Furthermore, maintaining long-term consistency is central to long video generation, yet existing methods pay insufficient attention to this aspect. To mitigate these concerns, we propose **MIGA**, a novel infinite-frame long video generation method. **(i)** Firstly, considering that the training-inference gap mainly stems from the excessive noise span of latents fed to the model during inference, we propose an effective two-stage alignment mechanism. By partitioning the generation process of existing frameworks into two dedicated stages with reduced noise spans, the capabilities of advanced foundation models are efficiently unlocked. **(ii)** Additionally, building upon the intrinsic properties of frame-level autoregressive frameworks, we introduce an innovative dual consistency enhancement mechanism. Specifically, our self-reflection approach evaluates and corrects early high-noise frames, while our long-range frame guidance approach leverages later low-noise frames with broad coverage to steer the generation process. These strategies jointly promote consistency in the generated content. **(iii)** Finally, extensive experiments on VBench and NarrLV demonstrate the state-of-the-art performance of MIGA.

Primary Area: generative models

Submission Number: 1842

Loading