Abstract: Traditional frame-based video frame interpolation (VFI) methods rely on the linear motion assumption and brightness invariance assumption, which may lead to fatal errors confronting the scenarios with high-speed motions. To tackle the above challenge, inspired by the advantages of event cameras on asynchronously recording brightness changes at each pixel, we propose a Fast-Slow joint synthesis framework for event-enhanced high-speed video frame interpolation, named <i>SuperFast</i> , in this paper, which can generate high frame rate (5000 FPS, 200× faster) video from the input low frame rate (25 FPS) video and the corresponding event stream. In our framework, the task is divided into two sub-tasks, i.e., video frame interpolation for the contents with and without high-speed motions, which are tackled by two corresponding branches, i.e., the fast synthesis pathway and the slow synthesis pathway. The fast synthesis pathway leverages a spiking neural network to encode the input event stream, and combines boundary frames to generate intermediate results through synthesis and refinement, targeting on contents with high-speed motions. The slow synthesis pathway stacks the two input boundary frames and the event stream to synthesize intermediate results, focusing on relatively slow-motion contents. Finally, a fusion module with a comparison loss is utilized to generate the final video frame interpolation results. We also build a hybrid visual acquisition system containing an event camera and a high frame rate camera, and collect the first 5000 FPS <b>H</b> igh- <b>S</b> peed <b>E</b> vent-enhanced <b>V</b> ideo frame <b>I</b> nterpolation (THU <inline-formula><tex-math notation="LaTeX">$^{\text{HSEVI}}$</tex-math></inline-formula> ) dataset. To evaluate the performance of our proposed framework, we have conducted experiments on our THU <inline-formula><tex-math notation="LaTeX">$^{\text{HSEVI}}$</tex-math></inline-formula> dataset and the existing HS-ERGB dataset. Experimental results demonstrate that our proposed framework can achieve state-of-the-art 200× video frame interpolation performance under high-speed motion scenarios.
0 Replies
Loading