Object-Centric Latent Action Learning

Published: 28 Feb 2025, Last Modified: 17 Apr 2025WRL@ICLR 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: full paper
Keywords: imitation learning from observations, learning from video, latent action learning, object-centric learning, slot attention
Abstract: Leveraging vast amounts of internet video data for Embodied AI is currently bottle-necked by the lack of action annotations and the presence of action-correlated distractors. We propose a novel object-centric latent action learning approach, based on VideoSaur and LAPO, that employs self-supervised decomposition of scenes into object representations and annotates video data with proxy-action labels. This method effectively disentangles causal agent-object interactions from irrelevant background noise and reduces the performance degradation of latent action learning approaches caused by distractors. Our preliminary experiments with the Distracting Control Suite show that latent action pretraining based on object decompositions improve the quality of inferred latent actions by **x2.7** and efficiency of downstream fine-tuning with a small set of labeled actions, increasing return by **x2.6** on average.
Presenter: ~Albina_Klepach1
Format: Maybe: the presenting author will attend in person, contingent on other factors that still need to be determined (e.g., visa, funding).
Funding: No, the presenting author of this submission does *not* fall under ICLR’s funding aims, or has sufficient alternate funding.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 78
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview