Towards Object-Centric Learning with General Purpose Architectures

Jack Brady; Julius von Kügelgen; Sebastien Lachapelle; Simon Buchholz; Thomas Kipf; Wieland Brendel

Towards Object-Centric Learning with General Purpose Architectures

Jack Brady, Julius von Kügelgen, Sebastien Lachapelle, Simon Buchholz, Thomas Kipf, Wieland Brendel

Published: 10 Oct 2024, Last Modified: 25 Dec 2024NeurIPS'24 Compositional Learning Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Representation Learning, Disentanglement, Object-Centric Learning, Transformers, Compsitionality

Abstract: Learning disentangled representations of objects in an image is a prerequisite for the robust compositional generalization in human intelligence. While progress has been made in learning such object-centric representations (OCRL), these methods rely on strong architectural priors which hinder scalability. In this work, we explore a more scalable approach for OCRL. Namely, we propose to use a general purpose architecture for OCRL and add inductive biases to the model via additional regularizers. To formulate suitable regularizers, we take inspiration from recent theoretical results which put forth two properties a model should satisfy to provably disentangle objects. We show that these properties can be scalably enforced using a VAE loss and a novel loss on the attention weights of a Transformer. We incorporate these regularizers into a general purpose Transformer autoencoder and attain competitive and often superior performance to existing methods in OCRL with stronger architectural priors.

Submission Number: 8

Loading