Abstract: Video generation has become an increasingly important component of AI-generated content (AIGC), owing to its rich semantic expressiveness and growing application potential. Among various generative paradigms, diffusion models have recently gained prominence due to their strong controllability, competitive visual quality, and compatibility with multimodal inputs. However, most existing surveys provide limited coverage of diffusion-based video generation, often lacking systematic analysis and comprehensive comparisons. To address this gap, this paper presents a thorough and structured review of diffusion models for video generation. We first outline the theoretical foundations and core architectures of diffusion models, and then the key design principles of representative methods for video generation were introduced. We propose a unified taxonomy that categorizes over two hundred methods, analyzing their key characteristics, strengths, and limitations. In addition, we compared the performance of classical methods and summarized commonly used datasets and evaluation metrics in this field for ease of model benchmarking and selection. Finally, we discuss open problems and future research directions, aiming to provide a valuable reference for both academic research and practical development.
External IDs:dblp:journals/air/MaYJLLLCYMSZGGYF25
Loading