Abstract: Video super-resolution aims to reconstruct the high-resolution video from its degraded observation. Existing methods primarily utilize explicit alignment strategies, including optical flow and motion compensation, and implicit alignment strategies, including deformable convolution and non-local attention, to model the temporal information. However, these alignment strategies with motion estimation and motion compensation (MEMC) introduce inaccurate inter-frame information to some extent, which largely affects the reconstruction performance. To avoid the error compensation information, we propose to model the temporal information from the perspective of the self-similarity between frames and design a multi-frame correlated representation network (MCRNet) for video super-resolution. To address the issue of temporal information distortion caused by large-scale pixel displacement and object occlusion, MCRNet extracts temporal information through similar regions on multiple frames, which are aggregated with allocated weights for information compensation. Moreover, we design a multi-scale non-local information fusion module for non-local correlation matching of spatio-temporal features in the multi-scale space, which maintains the scale consistency of spatio-temporal features. Experimental results indicate that MCRNet achieves promising gains over other competing methods which employ explicit and implicit alignment strategies on different datasets.
0 Replies
Loading