Reasoning-Enhanced Object-Centric Learning for Videos

20 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: object-centric learning, spatiotemporal attention, intuitive physics, reasoning, prediction
TL;DR: This paper introduces a novel reasoning module called the Slot-based Time-Space Transformer with Memory buffer (STATM) to enhance the deep learning model's perception ability in complex scenes.
Abstract: Object-centric learning aims to break down complex visual scenes into more manageable object representations, enhancing the understanding and reasoning abilities of machine learning systems toward the physical world. Recently, slot-based video models have demonstrated remarkable proficiency in segmenting and tracking objects. Although most modules in these models are well-designed, they overlook the importance of the effective reasoning module. In the real world, especially in complex scenes, reasoning and predictive abilities play a crucial role in human perception and object tracking; in particular, these abilities are closely related to human intuitive physics. Inspired by this, we designed a novel reasoning module called the Slot-based Time-Space Transformer with Memory buffer (STATM) to enhance the model's perception ability in complex scenes. The memory buffer primarily serves as storage for slot information from upstream modules, akin to human memory or field of view. The Slot-based Time-Space Transformer makes predictions through slot-based spatiotemporal attention computations and fusion. We demonstrated that the improved deep learning model exhibits certain degree of rationality imitating human behavior. This has crucial implications for understanding the relationship between deep learning and human cognition, especially in the context of intuitive physics.
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 2724
Loading