Towards Redundancy Reduction in Diffusion Models for Efficient Video Super-Resolution

Jinpei Guo; Yifei Ji; Zheng Chen; Yufei Wang; Sizhuo Ma; Yong Guo; Yulun Zhang; Jian Wang

Towards Redundancy Reduction in Diffusion Models for Efficient Video Super-Resolution

Jinpei Guo, Yifei Ji, Zheng Chen, Yufei Wang, Sizhuo Ma, Yong Guo, Yulun Zhang, Jian Wang

09 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Video Super-Resolution, Diffusion Model, One-Step Diffusion

Abstract: Diffusion models have recently shown promising results for video super-resolution (VSR). However, directly adapting generative diffusion models to VSR can result in redundancy, since low-quality videos already preserve substantial content information. Such redundancy leads to increased computational overhead and learning burden, as the model performs superfluous operations and must learn to filter out irrelevant information. To address this problem, we propose OASIS, an efficient $\textbf{o}$ne-step diffusion model with $\textbf{a}$ttention $\textbf{s}$pecialization for real-world v$\textbf{i}$deo $\textbf{s}$uper-resolution. OASIS incorporates an attention specialization routing that assigns attention heads to different patterns according to their intrinsic behaviors. This routing mitigates redundancy while effectively preserving pretrained knowledge, allowing diffusion models to better adapt to VSR and achieve stronger performance. Moreover, we propose a simple yet effective progressive training strategy, where training starts with temporally consistent degradations and then shifts to inconsistent settings. This strategy facilitates learning under complex degradations. Extensive experiments demonstrate that OASIS achieves state-of-the-art performance on both synthetic and real-world datasets. OASIS also provides superior inference speed, offering a $\textbf{6.2$\times$}$ speedup over one-step diffusion baselines such as SeedVR2. The code and models will be publicly available.

Supplementary Material: zip

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 3420

Loading