Keywords: video generation; distillation; reward model
Abstract: Recent advancements in video generation models have achieved remarkable quality but often suffer from slow inference due to the iterative denoising processes required by diffusion models. In this paper, we propose a novel distillation pipeline that leverages a reward model to improve the performance of the video generation model. Specifically, our approach distills the 50-step diffusion model into a few-step video generation model through matching the trajectory distribution. Furthermore, we integrate a carefully designed reward model into the training framework. This additional guidance not only mitigates the influence of redundant or uninformative data points during distillation but also enhances the overall generation quality. By optimizing the reward mechanism, the reward model provides fine-grained feedback on semantic consistency, visual fidelity, and temporal coherence. Extensive experiments demonstrate that our method achieves substantial acceleration in video generation.
Primary Area: generative models
Submission Number: 8725
Loading