Keywords: Spatial symmetry, Equivariance, Abstraction, Object-centric learning, Unsupervised learning
TL;DR: Translation and scaling equivariance in Slot Attention can lead to large gains in scene decomposition performance.
Abstract: Automatically discovering composable abstractions from raw perceptual data is a long-standing challenge in machine learning. Slot-based neural networks have recently shown promise at discovering and representing objects in visual scenes in a self-supervised fashion. While they make use of permutation symmetry of objects to drive learning of abstractions, they largely ignore other spatial symmetries present in the visual world. In this work, we introduce a simple, yet effective, method for incorporating spatial symmetries in attentional slot-based methods. We incorporate equivariance to translation and scale into the attention and generation mechanism of Slot Attention solely via translating and scaling positional encodings. Both changes result in little computational overhead, are easy to implement, and can result in large gains in data efficiency and scene decomposition performance.