Multi-Scale Coarse-to-Fine Transformer for Frame Interpolation

Chen Li, Li Song, Xueyi Zou, Jiaming Guo, Youliang Yan, Wenjun Zhang

2022 (modified: 17 Nov 2022)ACM Multimedia 2022Readers: Everyone

Abstract: The majority of prevailing video interpolation methods compute flows to estimate the intermediate motion. However, accurate estimation of the intermediate motion is difficult with low-order motion model hypothesis, which induces enormous difficulties for subsequent processing. To alleviate the limitation, we propose a two-stage flow-free video interpolation architecture. Rather than utilizing pre-defined motion models, our method represents complex motion through data-driven learning. In the first stage, we analyze spatial-temporal information and generate coarse anchor frame features. In the second stage, we employ transformers to transfer neighboring features to the intermediate time steps and enhance the spatial textures. To improve the quality of coarse anchor frame features and the robustness in dealing with the multi-scale textures with large-scale motion, we propose a multi-scale architecture and transformers with variable token sizes to progressively enhance the features. The experimental results demonstrate that our model outperforms state-of-the-art methods for both single frame and multi frames interpolation tasks, and the extended ablation studies verify the effectiveness of our model.

0 Replies