Identifiable Object Representations under Spatial Ambiguities

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0
TL;DR: We introduce a probabilistic model that resolves spatial ambiguities and provides theoretical guarantees for identifiability without additional viewpoint annotations.
Abstract: Modular object-centric representations are essential for *human-like reasoning* but are challenging to obtain under spatial ambiguities, *e.g. due to occlusions and view ambiguities*. However, addressing challenges presents both theoretical and practical difficulties. We introduce a novel multi-view probabilistic approach that aggregates view-specific slots to capture *invariant content* information while simultaneously learning disentangled global *viewpoint-level* information. Unlike prior single-view methods, our approach resolves spatial ambiguities, provides theoretical guarantees for identifiability, and requires *no viewpoint annotations*. Extensive experiments on standard benchmarks and novel complex datasets validate our method's robustness and scalability.
Lay Summary: Object-centric learning focuses on extracting distinct representations for individual objects within a scene, as opposed to learning a single global representation for the entire scene. A key challenge arises when objects are only partially visible or the scene is viewed from oblique or obscure angles—issues collectively referred to as spatial ambiguities. In this paper, we propose a method specifically designed to address these ambiguities. Our approach involves observing a given scene from multiple viewpoints and leveraging the resulting perspectives to correlate and integrate object-specific features, thereby producing a unified, viewpoint-invariant representation for each object. We provide both theoretical justification and empirical evidence demonstrating that this multi-view correlation strategy yields more robust and reliable object representations.
Primary Area: Deep Learning->Generative Models and Autoencoders
Keywords: Object-centric learning, identifiability, spatial ambiguities
Submission Number: 5759
Loading