Greedy Distill: Efficient Video Generative Modeling with Linear Time Complexity

Dingcheng Zhen; Qian Qiao; Tan Yu; Ruixin Zhang; Yutian Yan; Siyuan Liu; Shunshun Yin; Ming Tao; Xu Zheng

Greedy Distill: Efficient Video Generative Modeling with Linear Time Complexity

Dingcheng Zhen, Qian Qiao, Tan Yu, Ruixin Zhang, Yutian Yan, Siyuan Liu, Shunshun Yin, Ming Tao, Xu Zheng

15 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Video Generation, Diffusion based models Tr, model distillation

TL;DR: we propose a new asymmetric structural distillation method that produce videos with superior quality.

Abstract: Due to bidirectional attention dependencies, video generation models generally suffer from $O(n^2)$ computational complexity. In this work, we find the “local inter-frame information redundancy" phenomenon which indicates strong local temporal dependencies in video generation, with global attention to distant frames contributing only marginally. Built upon this finding, we introduce a novel distillation training paradigm for video diffusion models, namely GREEDY DISTILL. Specifically, to generate the next frame using only the 0-th and the last frames, we propose the Streaming Diffusion Decoder (SDD) as the “Greedy Decoder" to avoid redundant computational costs from the other frames. Meanwhile, to our knowledge, we introduce Efficient Temporal Module (ETM) to capture the global temporal information across frames. These two modules achieve the computational complexity reduction from $O(n^2)$ to linear. Moreover, we make the first attempt to apply RL fine-tuning to address the error accumulation during streaming generation. Our method achieves an overall score of 84.60 on the VBench benchmark, surpassing previous state-of-the-art methods by large margins(+4.18%). Qualitative results also demonstrate superior performance. Leveraging its efficient model structure and KV cache, it is able to rapidly generate high-quality video streams at 24 FPS (nearly 50% faster) on a single H100 GPU.

Supplementary Material: zip

Primary Area: generative models

Submission Number: 5458

Loading