Priors, Hierarchy, and Information Asymmetry for Skill Transfer in Reinforcement Learning

Sasha Salter; Kristian Hartikainen; Walter Goodwin; Ingmar Posner

Priors, Hierarchy, and Information Asymmetry for Skill Transfer in Reinforcement Learning

Sasha Salter, Kristian Hartikainen, Walter Goodwin, Ingmar Posner

Published: 01 Feb 2023, Last Modified: 28 Feb 2023ICLR 2023 posterReaders: Everyone

Keywords: Skills, Transfer Learning, Reinforcement Learning

TL;DR: We introduce 'Attentive Priors for Expressive and Transferable Skills' (APES), a hierarchical KL-regularized skill transfer method that automates the choice of information asymmetry thereby maximising transfer benefits.

Abstract: The ability to discover behaviours from past experience and transfer them to new tasks is a hallmark of intelligent agents acting sample-efficiently in the real world. Equipping embodied reinforcement learners with the same ability may be crucial for their successful deployment in robotics. While hierarchical and KL-regularized reinforcement learning individually hold promise here, arguably a hybrid approach could combine their respective benefits. Key to these fields is the use of information asymmetry across architectural modules to bias which skills are learnt. While asymmetry choice has a large influence on transferability, existing methods base their choice primarily on intuition in a domain-independent, potentially sub-optimal, manner. In this paper, we theoretically and empirically show the crucial expressivity-transferability trade-off of skills across sequential tasks, controlled by information asymmetry. Given this insight, we introduce Attentive Priors for Expressive and Transferable Skills (APES), a hierarchical KL-regularized method, heavily benefiting from both priors and hierarchy. Unlike existing approaches, APES automates the choice of asymmetry by learning it in a data-driven, domain-dependent, way based on our expressivity-transferability theorems. Experiments over complex transfer domains of varying levels of extrapolation and sparsity, such as robot block stacking, demonstrate the criticality of the correct asymmetric choice, with APES drastically outperforming previous methods.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)

21 Replies

Loading