Learning to Handle Large Obstructions in Video Frame Interpolation

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Video frame interpolation based on optical flow has made great progress in recent years. Most of the previous studies have focused on improving the quality of clean videos. However, many real-world videos contain large obstructions which cause blur and artifacts making the video discontinuous. To address this challenge, we propose our Obstruction Robustness Framework (ORF) that enhances the robustness of existing VFI networks in the face of large obstructions. The ORF contains two components: (1) A feature repair module that first captures ambiguous pixels in the synthetic frame by a region similarity map, then repairs them with a cross-overlap attention module. (2) A data augmentation strategy that enables the network to handle dynamic obstructions without extra data. To the best of our knowledge, this is the first work that explicitly addresses the error caused by large obstructions in video frame interpolation. By using previous state-of-the-art methods as backbones, our method not only improves the results in original benchmarks but also significantly enhances the interpolation quality for videos with obstructions.
Primary Subject Area: [Content] Media Interpretation
Secondary Subject Area: [Experience] Interactions and Quality of Experience
Relevance To Conference: Creating high-quality multimedia content (e.g., visual) is an important task in Media Interpretation. This work mainly discusses video frame interpolation dealing with large obstruction. Multimedia applications usually encounter various large obstructions or reflections in the real world. However, previous state-of-the-art (SOTA) methods have failed to handle this issue, generating large errors and artifacts in the predicted frame, which could significantly reduce video quality. This work proposes a new Obstruction Robustness Framework for video frame interpolation methods. We demonstrate that our methods significantly reduce the error and artifacts when generating frames with large obstruction. This fits well within the area of Media Interpretation. It involves generating new frames that are not explicitly captured by the camera. Furthermore, it is also closely related to Area Interactions and Quality of Experience as we improve the video quality of large obstruction scenes.
Supplementary Material: zip
Submission Number: 2166
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview