AUTOMATA GUIDED HIERARCHICAL REINFORCE- MENT LEARNING FOR ZERO-SHOT SKILL COMPOSI- TION

Anonymous

Nov 07, 2017 (modified: Nov 07, 2017) ICLR 2018 Conference Blind Submission readers: everyone Show Bibtex
  • Abstract: An obstacle that prevents the wide adoption of (deep) reinforcement learning (RL) in control systems is its need for a large amount of interactions with the environ- ment in order to master a skill. The learned skill usually generalizes poorly across domains and re-training is often necessary when presented with a new task. We present a framework that combines methods in temporal logic (TL) with hierar- chical reinforcement learning (HRL). The set of techniques we provide allows for convenient specification of tasks with complex logic, learn hierarchical policies (meta-controller and low-level controllers) with well-defined intrinsic rewards us- ing any RL methods and is able to construct new skills from existing ones without additional learning. We evaluate the proposed methods in a simple grid world simulation as well as simulation on a Baxter robot. An obstacle that prevents the wide adoption of (deep) reinforcement learning (RL) in control systems is its need for a large amount of interactions with the environ- ment in order to master a skill. The learned skill usually generalizes poorly across domains and re-training is often necessary when presented with a new task. We present a framework that combines methods in temporal logic (TL) with hierar- chical reinforcement learning (HRL). The set of techniques we provide allows for convenient specification of tasks with complex logic, learn hierarchical policies (meta-controller and low-level controllers) with well-defined intrinsic rewards us- ing any RL methods and is able to construct new skills from existing ones without additional learning. We evaluate the proposed methods in a simple grid world simulation as well as simulation on a Baxter robot.
  • TL;DR: Combine temporal logic with hierarchical reinforcement learning for skill composition
  • Keywords: Hierarchical Reinforcement learning, temporal logic, skill composition

Loading