OS-DiffVSR: towards one-step latent diffusion model for high-detailed real-world video super-resolution
Abstract: Recently, the latent diffusion model has demonstrated promising performance in real-world video super-resolution (VSR) task, which can reconstruct high-quality videos from distorted low-resolution input through multiple diffusion steps. Compared to image super-resolution (ISR), VSR needs to process each frame in a video, which poses challenges to inference efficiency. However, video quality and inference efficiency have always been a trade-off for the diffusion model-based VSR methods. In this work, we propose a One-Step Diffusion model for real-world Video Super-Resolution, namely OS-DiffVSR. Specifically, we devise a novel adversarial training paradigm, which can significantly improve the quality of synthesis videos. Besides, we devise multi frame fusion mechanism to maintain inter frame temporal consistency and reduce the flicker in video. Extensive experiments on several popular VSR benchmarks demonstrate that OS-DiffVSR can even achieve better quality than existing diffusion-based VSR methods that require dozens of sampling steps.
Loading