SlotFormer: Long-Term Dynamic Modeling in Object-Centric ModelsDownload PDF

Published: 09 Jul 2022, Last Modified: 05 May 2023CRL@UAI 2022 PosterReaders: Everyone
Keywords: Object-centric learning, dynamics modeling, Transformer
TL;DR: We propose a general Transformer-based dynamic model to enable consistent future rollout in object-centric models
Abstract: Understanding dynamics from visual observations is a challenging problem that requires disentangling individual objects from the scene and learning their interactions. While recent object-centric models can successfully decompose a scene into objects, modeling their dynamics effectively still remains a challenge. We address this problem by introducing SlotFormer - a Transformer-based autoregressive model operating on learned object-centric representations. Given a video clip, our approach performs dynamic reasoning over object features to model spatial-temporal object relationships and generate realistic future frames. In this paper, we successfully apply SlotFormer to the problem of consistent long-term dynamic modeling in object-centric models. We compare SlotFormer to image-based video prediction models and object-centric dynamic models on two synthetic video datasets consisting of complex object interactions. Our method generates videos of high quality as measured by conventional video prediction metrics, while achieving significantly better long-term synthesis of object dynamics.
4 Replies

Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview