Keywords: State-Action Abstractions, Predictive Coding, Hierarchical Planning, Compositional World Models, Contrastive Learning, Hypernetworks, Hierarchical Reinforcement Learning
TL;DR: Composer learns scalable state-action abstractions with interesting properties and use cases. These abstractions are effective for downstream Hierarchical RL and planning even for unseen goals and environments.
Abstract: We present a modular and compositional approach to learning human-aligned world models via state-action hierarchies. Our approach is inspired by sensory-motor hierarchies in the mammalian brain. We model complex state transition dynamics as a sequence of simpler dynamics, which in turn can be modeled using even simpler dynamics, and so on, endowing the approach with rich compositionality. We introduce Composer, a practical method for learning complex world models that leverages hypernetworks and abstract states for generating lower-level transition functions on-the-fly. We first show that state abstractions in Composer emerge naturally in simple environments as a consequence of training. Incorporating a variant of contrastive learning allows Composer to scale to more complex environments while ensuring that the learned abstractions are human aligned. Additionally, learning a higher-level transition function between learned abstract states leads to a hierarchy of transition functions for modeling complex dynamics. We apply Composer to compositional navigation problems and show its capability for rapid planning and transfer to novel scenarios. In both traditional grid-world navigation problems as well as in the more complex Habitat vision-based navigation domain, a Composer-based agent learns to model the state-action dynamics within and between different rooms using a hierarchy of transition functions and leverage this hierarchy for efficient downstream planning. Our results suggest that Composer offers a promising framework for learning the complex dynamics of real-world environments using a compositional and interpretable approach.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 13341
Loading