Toward Compositional Latent Action Interfaces for Generalizable Agents

Published: 27 May 2026, Last Modified: 17 Jun 2026CompLearn 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Latent Action Model, Compositionality, Factorization
TL;DR: We suggest that a generalizable latent action interface may need reusable observed-effect primitives before monolithic action-like latents.
Abstract: Latent Action Models (LAMs) learn action proxies from observation transitions, but they face a fundamental ambiguity in multi-object or distractor-rich scenes: without supervision, the model cannot determine which changes are caused by the controlled agent. When multiple transition sources and environment-specific factors are compressed into a single monolithic latent action, the resulting representation can become sensitive to distractors and may generalize poorly under distribution shift. In this paper, we introduce Observed Transition Factorization (OTF), a framework that represents each transition as a sparse composition of reusable observed-transition primitives, yielding a compositional latent action interface. Building on this representation, we propose OTF-LAM, a latent action model that constructs action-like latents from factorized transition primitives within the standard inverse--forward dynamics framework. Empirically, OTF primitives transfer zero-shot across controlled carrier shifts and cross-morphology DCS settings, support downstream policy learning through OTF-LAM, and provide a promising path toward decoder-free latent action modeling with frozen visual representations.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 189
Loading