SlotFormer: Long-Term Dynamic Modeling in Object-Centric Models

Ziyi Wu; Nikita Dvornik; Klaus Greff; Jiaqi Xi; Thomas Kipf; Animesh Garg

SlotFormer: Long-Term Dynamic Modeling in Object-Centric Models

Ziyi Wu, Nikita Dvornik, Klaus Greff, Jiaqi Xi, Thomas Kipf, Animesh Garg

Published: 09 Jul 2022, Last Modified: 05 May 2023CRL@UAI 2022 PosterReaders: Everyone

Keywords: Object-centric learning, dynamics modeling, Transformer

TL;DR: We propose a general Transformer-based dynamic model to enable consistent future rollout in object-centric models

Abstract: Understanding dynamics from visual observations is a challenging problem that requires disentangling individual objects from the scene and learning their interactions. While recent object-centric models can successfully decompose a scene into objects, modeling their dynamics effectively still remains a challenge. We address this problem by introducing SlotFormer - a Transformer-based autoregressive model operating on learned object-centric representations. Given a video clip, our approach performs dynamic reasoning over object features to model spatial-temporal object relationships and generate realistic future frames. In this paper, we successfully apply SlotFormer to the problem of consistent long-term dynamic modeling in object-centric models. We compare SlotFormer to image-based video prediction models and object-centric dynamic models on two synthetic video datasets consisting of complex object interactions. Our method generates videos of high quality as measured by conventional video prediction metrics, while achieving significantly better long-term synthesis of object dynamics.

4 Replies

Loading