SuperFast: 200× Video Frame Interpolation via Event Camera

Yue Gao, Si-Qi Li, Yipeng Li, Yandong Guo, Qionghai Dai

Published: 01 Jan 2023, Last Modified: 06 Nov 2023IEEE Trans. Pattern Anal. Mach. Intell. 2023Readers: Everyone

Abstract: Traditional frame-based video frame interpolation (VFI) methods rely on the linear motion assumption and brightness invariance assumption, which may lead to fatal errors confronting the scenarios with high-speed motions. To tackle the above challenge, inspired by the advantages of event cameras on asynchronously recording brightness changes at each pixel, we propose a Fast-Slow joint synthesis framework for event-enhanced high-speed video frame interpolation, named SuperFast , in this paper, which can generate high frame rate (5000 FPS, 200× faster) video from the input low frame rate (25 FPS) video and the corresponding event stream. In our framework, the task is divided into two sub-tasks, i.e., video frame interpolation for the contents with and without high-speed motions, which are tackled by two corresponding branches, i.e., the fast synthesis pathway and the slow synthesis pathway. The fast synthesis pathway leverages a spiking neural network to encode the input event stream, and combines boundary frames to generate intermediate results through synthesis and refinement, targeting on contents with high-speed motions. The slow synthesis pathway stacks the two input boundary frames and the event stream to synthesize intermediate results, focusing on relatively slow-motion contents. Finally, a fusion module with a comparison loss is utilized to generate the final video frame interpolation results. We also build a hybrid visual acquisition system containing an event camera and a high frame rate camera, and collect the first 5000 FPS H igh- S peed E vent-enhanced V ideo frame I nterpolation (THU <inline-formula><tex-math notation="LaTeX">$^{\text{HSEVI}}$</tex-math></inline-formula> ) dataset. To evaluate the performance of our proposed framework, we have conducted experiments on our THU <inline-formula><tex-math notation="LaTeX">$^{\text{HSEVI}}$</tex-math></inline-formula> dataset and the existing HS-ERGB dataset. Experimental results demonstrate that our proposed framework can achieve state-of-the-art 200× video frame interpolation performance under high-speed motion scenarios.

0 Replies