Successor Representations Enable Emergent Compositional Instruction Following

Published: 24 Oct 2024, Last Modified: 09 Nov 2024LEAP 2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: robotic manipulation, long-horizon tasks, instruction following, representation learning
TL;DR: Learning temporally consistent ("successor") representations can enable zero-shot compositional generalization in real goal- and language-conditioned robotic manipulation tasks, without explicit planning or reinforcement learning.
Abstract: Effective task representations should facilitate compositionality, such that after learning a variety of basic tasks, an agent can perform compound tasks consisting of multiple steps simply by composing the representations of the constituent steps together. While this is conceptually simple and appealing, it is not clear how to automatically learn representations that enable this sort of compositionality. We show that learning to associate the representations of current and future states with a temporal alignment loss can improve compositional generalization, even in the absence of any explicit subtask planning or reinforcement learning. This approach is able to generalize to novel composite tasks specified as goal images or language instructions, without assuming any additional reward supervision or explicit subtask planning. We evaluate our approach across diverse tabletop robotic manipulation tasks, showing substantial improvements for tasks specified with either language or goal images.
Submission Number: 36
Loading