Abstract: Reinforcement Learning (RL) has shown promising results learning policies for complex tasks, but can often suffer from low sample efficiency and limited transfer. We introduce the Hierarchy of Interaction Skills (HIntS) algorithm, which uses learned interaction detectors to discover and train a hierarchy of skills that manipulate factors in factored environments. Inspired by Granger causality, these unsupervised detectors capture key events between factors to sample efficiently learn useful skills and transfer those skills to other related tasks---tasks where many reinforcement learning techniques struggle. We evaluate HIntS on a robotic pushing task with obstacles---a challenging domain where other RL and HRL methods fall short. The learned skills not only demonstrate transfer using variants of Breakout, a common RL benchmark, but also show 2-3x improvement in both sample efficiency and final performance compared to comparable RL baselines. Together, HIntS demonstrates a proof of concept for using Granger-causal relationships for skill discovery.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=VjCLzhnCXZ
Changes Since Last Submission: Figure 4 updated with RIDE results
Figure 3 updated to be simpler
updated intro and conclusion to clarify claims around multi-interaction environments
Added Appendix I.4
Added Figures 6-9 to address qualitative illustrations of the HIntS skills
(since original submission):
Figure 1 updated with a skill chain.
Figure 2 has been broken into Figure 2,3
Added suggested alterations with red text.
Assigned Action Editor: ~Branislav_Kveton1
Submission Number: 1228
Loading