Learning Object-Centric Dynamic Modes from Video and Emerging PropertiesDownload PDF

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone
Keywords: Koopman theory, dynamics, video representation learning, dynamic mode decomposition, video manipulation, object-centric decomposition
TL;DR: We propose a model for dynamics interpretability and manipulation by means of object-centric dynamic mode decomposition, directly from pixels.
Abstract: One of the long-term objectives of Artificial Intelligence is to endow machines with the capacity of structuring and interpreting the world as we do. Towards this goal, recent methods have successfully decomposed and disentangled video sequences into their composing objects, attributes and dynamics, in a self-supervised fashion. However, there have been scarce efforts to propose useful decompositions of the dynamics in a scene. We propose a method to decompose a video into moving objects, their attributes and the dynamic modes of their trajectories. We model the objects' dynamics with linear system identification tools, by means of a Koopman mapping and the Koopman operator $\mathcal{K}$. This allows user access and interpretation of the dynamics in the scene. We test our framework in a variety of datasets, while illustrating the novel features that emerge from our dynamic modes decomposition: temporal super-resolution, backwards forecasting, model reduction and video dynamics interpretation and manipulation at test-time. We successfully forecast challenging object trajectories from pixels, achieving competitive performance while drawing useful insights.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)
Supplementary Material: zip
5 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview