Flow Caching for Autoregressive Video Generation

ICLR 2026 Conference Submission23495 Authors

Published: 26 Jan 2026, Last Modified: 26 Jan 2026ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Autoregressive video generation, chunkwise caching, KV cache compression, ultra-long video synthesis, video acceleration
Abstract: Autoregressive models, often built on Transformer architectures, represent a powerful paradigm for generating ultra-long videos by synthesizing content in sequential chunks. However, this sequential generation process is notoriously slow. While caching strategies have proven effective for accelerating traditional video diffusion models, existing methods assume uniform denoising across all frames—an assumption that breaks down in autoregressive models where different video chunks exhibit varying similarity patterns at identical timesteps. In this paper, we present FlowCache, the first caching framework specifically designed for autoregressive video generation. Our key insight is that each video chunk should maintain independent caching policies, allowing fine-grained control over which chunks require recomputation at each timestep. We introduce a chunkwise caching strategy that dynamically adapts to the unique denoising characteristics of each chunk, complemented by an importance-based KV cache compression mechanism that maintains fixed memory bounds while preserving generation quality. Our method achieves remarkable speedups of $\textbf{2.38}\times$ on MAGI-1 and $\textbf{6.7}\times$ on SkyReels-V2, with negligible quality degradation (VBench: $0.87\uparrow$ and $0.79\downarrow$ respectively). These results demonstrate that FlowCache, successfully unlocks the potential of autoregressive models for real-time, ultra-long video generation—establishing a new benchmark for efficient video synthesis at scale. The code is available at https://anonymous.4open.science/r/FlowCache-23495iclrAnonymous
Primary Area: generative models
Submission Number: 23495
Loading