Policy Architectures for Compositional Generalization in ControlDownload PDF

08 Oct 2022, 17:47 (modified: 09 Dec 2022, 14:31)Deep RL Workshop 2022Readers: Everyone
Keywords: architectures, geometry, symmetry, control
TL;DR: A framework for compositional multi-entity task that motivates efficient policy architectures.
Abstract: Several tasks in control, robotics, and planning can be specified through desired goal configurations for entities in the environment. Learning goal-conditioned policies is a natural paradigm to solve such tasks. However, learning and generalizing on complex tasks can be challenging due to variations in number of entities or compositions of goals. To address this challenge, we introduce the Entity-Factored Markov Decision Process (EFMDP), a formal framework for modeling the entity-based compositional structure in control tasks. Geometrical properties of the EFMDP framework provide theoretical motivation for policy architecture design, particularly Deep Sets and popular relational mechanisms such as graphs and self attention. These structured policy architectures are flexible and can be trained end-to-end with standard reinforcement and imitation learning algorithms. We study and compare the learning and generalization properties of these architectures on a suite of simulated robot manipulation tasks, finding that they achieve significantly higher success rates with less data compared to standard multilayer perceptrons. Structured policies also enable broader and more compositional generalization, producing policies that \textbf{extrapolate} to different numbers of entities than seen in training, and \textbf{stitch} together (i.e. compose) learned skills in novel ways. Video results can be found at \url{https://sites.google.com/view/comp-gen-anon}.
0 Replies