Policy Architectures for Compositional Generalization in ControlDownload PDF

28 May 2022, 15:03 (modified: 21 Jul 2022, 01:30)SCIS 2022 PosterReaders: Everyone
Keywords: mdp, invariance, rl
Abstract: Several tasks in control, robotics, and planning can be specified through desired goal configurations for entities in the environment. Learning goal-conditioned policies is a natural paradigm to solve such tasks. Current approaches, however, struggle to learn and generalize as task complexity increases, such as due to variations in number of entities or compositions of goals. To overcome these challenges, we first introduce the Entity-Factored Markov Decision Process (EFMDP), a formal framework for modeling the entity-based compositional structure in control tasks. Subsequently, we outline policy architecture choices that can successfully leverage the geometric properties of the EFMDP model. Our framework theoretically motivates the use of Self-Attention and Deep Set architectures for control, and results in flexible policies that can be trained end-to-end with standard reinforcement and imitation learning algorithms. On a suite of simulated robot manipulation tasks, we find that these architectures achieve significantly higher success rates with less data, compared to the standard multilayer perceptron. Our structured policies also enable broader and more compositional generalization, producing policies that \textbf{extrapolate} to different numbers of entities than seen in training, and \textbf{stitch} together (i.e. compose) learned skills in novel ways. Video results can be found at https://sites.google.com/view/comp-gen-anon.
Confirmation: Yes
0 Replies