Slot Structured World Models

TMLR Paper3099 Authors

31 Jul 2024 (modified: 10 Oct 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The ability to perceive and reason about individual objects enables humans to build a robust understanding of the environment and its dynamics. Replicating such abilities in artificial systems would represent a significant milestone toward building intelligent agents. Contrastive Learning of Structured World Models (C-SWMs) took a step in this direction, proposing an unsupervised approach to embed images as compositions of individual object representations and model their pair-wise relationships. Yet the proposed architecture presents an encoder that cannot disambiguate different objects characterized by the same visual features, and the method has only been tested in settings where encoding just the object position and velocity was sufficient to learn the dynamics of the environment. In this regard, we introduce Slot Structured World Models (SSWMs), a class of world models augmenting C-SWMs with a pretrained object-centric encoder. We further propose a version of the Spriteworld environment that includes simple physics to challenge these models. Quantitative and qualitative measures show that the proposed method outperforms the baseline on the given environment, although it presents severe limitations in multi-step prediction.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Florian_Shkurti1
Submission Number: 3099
Loading