Inverted-Attention Transformers can Learn Object Representations: Insights from Slot Attention

Yi-Fu Wu; Klaus Greff; Gamaleldin Fathy Elsayed; Michael Curtis Mozer; Thomas Kipf; Sjoerd van Steenkiste

Inverted-Attention Transformers can Learn Object Representations: Insights from Slot Attention

Yi-Fu Wu, Klaus Greff, Gamaleldin Fathy Elsayed, Michael Curtis Mozer, Thomas Kipf, Sjoerd van Steenkiste

Published: 27 Oct 2023, Last Modified: 05 Dec 2023CRL@NeurIPS 2023 PosterEveryoneRevisionsBibTeX

Keywords: Unsupervised object-centric learning, Slot attention, Causality-inspired representation learning

TL;DR: Inverted-attention transformers can learn object representations.

Abstract:

Visual reasoning is supported by a causal understanding of the physical world, and theories of human cognition suppose that a necessary step to causal understanding is the discovery and representation of high-level entities like objects. Slot Attention is a popular method aimed at object-centric learning, and its popularity has resulted in dozens of variants and extensions. To help understand the core assumptions that lead to successful object-centric learning, we take a step back and identify the minimal set of changes to a standard Transformer architecture to obtain the same performance as the specialized Slot Attention models. We systematically evaluate the performance and scaling behaviour of several "intermediate" architectures on seven image and video datasets from prior work. Our analysis reveals that by simply inverting the attention mechanism of Transformers, we obtain performance competitive with state-of-the-art Slot Attention in several domains.

Submission Number: 45

Loading