Abstract: Video Frame Interpolation (VFI) has witnessed a surge in popularity due to its abundant downstream applications. Event-based VFI (E-VFI) has recently propelled the ad-vancement of VFI. Thanks to the high temporal resolution benefits, event cameras can bridge the informational void present between successive video frames. Most state-of-the-art E-VFI methodologies follow the conventional VFI paradigm, which pivots on motion estimation between consecutive frames to generate intermediate frames through a process of warping and refinement. However, this reliance engenders a heavy dependency on the quality and consis-tency of keyframes, rendering these methods susceptible to challenges in extreme real-world scenarios, such as missing moving objects and severe occlusion dilemmas. This study proposes a novel E-VFI framework that directly synthesize intermediate frames leveraging event-based reference, obviating the necessity for explicit motion estimation and substantially enhancing the capacity to handle motion occlusion. Given the sparse and inher-ently noisy nature of event data, we prioritize the relia-bility of the event-based reference, leading to the development of an innovative event-aware reconstruction strategy for accurate reference generation. Besides, we implement a bi-directional event-guided alignment from keyframes to the reference using the introduced E-PCD module. Finally, a transformer-based decoder is adopted for prediction re-finement. Comprehensive experimental evaluations on both synthetic and real-world datasets underscore the superiority of our approach and its potential to execute high-quality VFI tasks.
Loading