Aligning Anything: Hierarchical Motion Estimation for Video Frame Interpolation

25 Sept 2024 (modified: 13 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Video frame interpolation; hierarchical motion estimation; pixel-level; target-level
TL;DR: We marry the target-level motion to the pixel- level motion to form the hierarchical motion estimation for video frame interpolation.
Abstract: Existing advanced video frame interpolation (VFI) methods struggle to learn accurate per-pixel motion or target-level motion. The reasons lie in that pixel-level motion estimation allows for infinite possibilities, making it challenging to guarantee fitting accuracy and global motion consistency, especially for rigid objects. Conversely, target-level motion consistency from the same moving target also breaks down when the assumption of object rigidity no longer holds. Therefore, a hierarchical motion learn scheme is imperative to promote the accuracy and stability of motion prediction. Specifically, we marry the target-level motion to the pixel-level motion to form the hierarchical motion estimation. It elaborately introduces specific semantics priors from open-world knowledge models such as the Recognize Anything Model (RAM), Grounding DIDO, and the High-Quality Segment Anything Model (HQ-SAM) to facilitate the latent target-level motion learning. In particular, a hybrid contextual feature extraction module (HCE) is employed to aggregate both pixel-wise and semantic representations, followed by the hierarchical motion and feature interactive refinement module (HIR) to simulate the current motion patterns. When integrating these adaptions to existing SOTA VFI methods, more consistent motion estimation and interpolation are predicted. Extensive experiments show that advanced VFI networks plugged with our adaptions can achieve more superior performances on various benchmark datasets
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 4502
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview