Keywords: object-centric learning, causal representation learning, composition, scene graphs
TL;DR: We provide a unifying perspective on causal representation and object centric learning, grounded in causal abstractions for scene graph modelling.
Abstract: The goal of Object-Centric Learning (OCL) is to enable machine learning systems to decompose complex scenes into discrete, interacting objects, supporting compositional generalization and human-like reasoning. However, existing OCL methods often fail to capture interactions from both attribute-level (semantic) and object-level (spatial) perspectives. While scene graph methods complement OCL by abstracting scenes as structured graphs, they typically rely on supervision. This position paper argues for a probabilistic perspective on Scene Graph Modelling (SGM), grounded in causal abstraction as a unifying view on causality, OCL, and scene graphs by considering object interactions as invariant mechanisms within object-level graphs, enabling us to generate causally consistent scene compositions. We substantiate our position with thorough conceptual discussion, rigorous definitions, conjectures, and examples, demonstrating how this perspective bridges the gap between unsupervised object discovery and explicit scene graph reasoning.
Submission Number: 32
Loading