Abstract: In video super-resolution, exploiting spatial information of reference frame and temporal information from neighbouring frames is significant but challenging. Since existing image super-resolution (SR) methods have achieved remarkable reconstruction results, in this paper, we propose a generic frame-wise dynamic fusion module (DFM) to fully aggregate temporal information into reference frame. Specifically, we employ dynamic convolution to flexibly fuse element-wise temporal information frame by frame. Before that, to handle large motion across frames, we propose a self-calibrated deformable (SCD) alignment module, in which motion offsets are predicted via self-calibrated convolution that explicitly expand receptive field of each convolutional layer through internal communications in a multi-resolution manner. The aligned features of each neighbouring frame are then fed to the DFM to make a temporal information fusion. Finally, the reference features containing spatial and temporal information are sent into SR reconstruction module for the high-resolution frame. Experimental results on several datasets demonstrate superior performance to state-of-the-art published methods on video super-resolution.
0 Replies
Loading