ProMotion: Prototypes as Motion Learners

Published: 01 Jan 2024, Last Modified: 14 May 2025CVPR 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In this work, we introduce PRoMoTION, a unified proto-typical transformer-based framework engineered to model fundamental motion tasks. PRoMoTION offers a range of compelling attributes that set it apart from current task-specific paradigms. (1) We adopt a prototypical perspective, establishing a unified paradigm that harmonizes disparate motion learning approaches. This novel paradigm stream-lines the architectural design, enabling the simultaneous assimilation of diverse motion information. (2) We capitalize on a dual mechanism involving the feature denoiser and the prototypical learner to decipher the intricacies of motion. This approach effectively circumvents the pitfalls of ambiguity in pixel-wise feature matching, significantly bolstering the robustness of motion representation. (3)) We demon-strate a profound degree of transferability across distinct motion patterns. This inherent versatility reverberates robustly across a comprehensive spectrum of both 2D and 3D downstream tasks. Empirical results demonstrate that PRoMOTION outperforms various well-known specialized architectures, achieving 0.54 and 0.054 $AbsRel$ error on the Sintel and KITTI depth datasets, 1.04 and 2.01 average endpoint error on the clean and final pass of Sintel flow benchmark, and 4.30 F1-all error on the KITTI flow bench-mark. For its efficacy, we hope our work can catalyze a paradigm shift in universal models in computer vision.
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview