Abstract: Humans can decompose previous experiences into skills and reuse them to enable fast learning in the future. Inspired by this process, we propose a new model called Option-Controller Network (OCN), which is a bi-level recurrent policy network composed of a high-level controller and a pool of low-level options. The options are disconnected from any task-specific information to model task-agnostic skills. And the controller uses options to solve a given task. With the isolation of information and the synchronous calling mechanism, we can impose a division of work between the controller and options in an end-to-end training regime. In experiments, we first perform behavior cloning from unstructured demonstrations of different tasks. We then freeze the learned options and learn a new controller to solve a new task. Extensive results on discrete and continuous environments show that OCN can jointly learn to decompose unstructured demonstrations into skills and model each skill with separate options. The learned options provide a good temporal abstraction, allowing OCN to quickly transfer to tasks with a novel combination of learned skills even with sparse reward, while previous methods suffer from the delayed reward problem due to the lack of temporal abstraction or a complicated option-controlling mechanism.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=q1xha76ElA&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DTMLR%2FAuthors%23your-submissions)
Changes Since Last Submission: We fixed the format.
Assigned Action Editor: ~Matthieu_Geist1
Submission Number: 1230
Loading