SkipVSR: Adaptive Patch Routing for Video Super-Resolution with Inter-Frame Mask

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Deep neural networks have revealed enormous potential in video super-resolution (VSR), yet the expensive computational expense limits their deployment on resource-limited devices and actual scenarios, especially for restoring multiple frames simultaneously. Existing VSR models contain considerable redundant filter, which drag down the inference efficiency. To accelerate the inference of VSR models, we propose a scalable method based on adaptive patch routing to achieve more practical speedup. Specifically, we design a confidence estimator to predict the aggregation performance of each block for adjacent patch information, which learns to dynamically perform block skipping, i.e., choose which basic blocks of a VSR network to execute during inference so as to reduce total computation to the maximum extent without degrading reconstruction accuracy dramatically. However, we observe that skipping error would be amplified as the hidden states propagate along with recurrent networks. To alleviate the issue, we design Temporal feature distillation to guarantee the performance. This proposal essentially proposes an adaptive routing scheme for each patch. Extensive experiments demonstrate that our method can not only accelerate inference but also provide strong quantitative and qualitative results with the learned strategies. Built upon an BasicVSR model, our method achieves a speedup of 20% on average, going as high as 50% for some images, while even maintaining competitive performance on REDS4.
Primary Subject Area: [Experience] Multimedia Applications
Relevance To Conference: Image super-resolution aims to generate high-resolution images from low-resolution counterparts. By improving the visual quality of images, SR techniques can enhance the overall multimedia experience. High-resolution images provide more details, sharpness, and clarity, resulting in improved visual perception for multimedia applications such as video streaming, gaming, and image editing.
Submission Number: 5209
Loading