- TL;DR: We develop a hierarchical, actor-critic algorithm for compositional transfer by sharing policy components and demonstrate component specialization and related direct benefits in multitask domains as well as its adaptation for single tasks.
- Abstract: The successful application of flexible, general learning algorithms to real-world robotics applications is often limited by their poor data-efficiency. To address the challenge, domains with more than one dominant task of interest encourage the sharing of information across tasks to limit required experiment time. To this end, we investigate compositional inductive biases in the form of hierarchical policies as a mechanism for knowledge transfer across tasks in reinforcement learning (RL). We demonstrate that this type of hierarchy enables positive transfer while mitigating negative interference. Furthermore, we demonstrate the benefits of additional incentives to efficiently decompose task solutions. Our experiments show that these incentives are naturally given in multitask learning and can be easily introduced for single objectives. We design an RL algorithm that enables stable and fast learning of structured policies and the effective reuse of both behavior components and transition data across tasks in an off-policy setting. Finally, we evaluate our algorithm in simulated environments as well as physical robot experiments and demonstrate substantial improvements in data data-efficiency over competitive baselines.
- Keywords: Multitask, Transfer Learning, Reinforcement Learning, Hierarchical Reinforcement Learning, Compositional, Off-Policy
- Original Pdf: pdf