Stitching Manifolds: Leveraging Interaction to Compose Object Representations into Scenes.

Published: 17 Jun 2024, Last Modified: 12 Jul 2024ICML 2024 Workshop GRaMEveryoneRevisionsBibTeXCC BY 4.0
Track: Extended abstract
Keywords: Representation Learning, Group Theory, Equivariance, Navigation, Compositional Generalization.
TL;DR: We propose a stitching procedure to compose single object group structured representations into scene representations.
Abstract: In the present work, we address the problem of generalization by leveraging interaction to compose previously acquired knowledge. We show that the problem of long distance navigation can be naturally decomposed into local navigation around multiple previously known landmarks. Since these landmarks enter and exit the agent's field of view and frequently occlude each other, they must be considered collectively. We propose a two-step approach where an agent first acquires group-structured representations of individual objects by navigating around them and witnessing the changes to the view caused by its movement. In the second stage, we introduce a stitching procedure to combine the learned individual object manifolds into a coherent representation of the scene. The stitched representation is a group structured representation of the whole scene which can be maintained from any object in view and predict all other objects pose. In conclusion, the agent learns a world model representation for its navigation of the scene that is modular and data efficient, relying solely on interaction which enables it to situate itself, predict its pose evolution from performed actions and infer actions connecting two observations.
Submission Number: 96
Loading