Inducing Reusable Skills From Demonstrations with Option-Controller NetworkDownload PDF

Published: 28 Jan 2022, Last Modified: 13 Feb 2023ICLR 2022 SubmittedReaders: Everyone
Keywords: Reusable Skill, Option-Controller Network
Abstract: Humans can decompose previous experiences into skills and reuse them to enable fast learning in the future. Inspired by this process, we propose a new model called Option-Controller Network (OCN), which is a bi-level recurrent policy network composed of a high-level controller and a pool of low-level options. The options are disconnected from any task-specific information to model task-agnostic skills. The controller use options to solve a given task, and it calls one option at a time and waits until the option return. With the isolation of information and the synchronous calling mechanism, we can impose a division of works between the controller and options in an end-to-end training regime. In experiments, we first perform behavior cloning from unstructured demonstrations coming from different tasks. We then freeze the learned options and learn a new controller with an RL algorithm to solve a new task. Extensive results on discrete and continuous environments show that OCN can jointly learn to decompose unstructured demonstrations into skills and model each skill with separate options. The learned options provide a good temporal abstraction, allowing OCN to quickly transfer to tasks with a novel combination of learned skills even with sparse reward, while previous methods either suffer from the delayed reward problem due to the lack of temporal abstraction or a complicated option controlling mechanism that increases the complexity of exploration.
One-sentence Summary: We introduce an Option-Controller Network to induce reusable skills.
9 Replies

Loading