Granger-Causal Hierarchical Skill Discovery

TMLR Paper1228 Authors

02 Jun 2023 (modified: 04 Aug 2023)Rejected by TMLREveryoneRevisionsBibTeX
Abstract: Reinforcement Learning (RL) has shown promising results learning policies for complex tasks, but can often suffer from low sample efficiency and limited transfer. We introduce the Hierarchy of Interaction Skills (HIntS) algorithm, which uses learned interaction detectors to discover and train a hierarchy of skills that manipulate factors in factored environments. Inspired by Granger causality, these unsupervised detectors capture key events between factors to sample efficiently learn useful skills and transfer those skills to other related tasks---tasks where many reinforcement learning techniques struggle. We evaluate HIntS on a robotic pushing task with obstacles---a challenging domain where other RL and HRL methods fall short. The learned skills not only demonstrate transfer using variants of Breakout, a common RL benchmark, but also show 2-3x improvement in both sample efficiency and final performance compared to comparable RL baselines. Together, HIntS demonstrates a proof of concept for using Granger-causal relationships for skill discovery.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=VjCLzhnCXZ
Changes Since Last Submission: Figure 4 updated with RIDE results Figure 3 updated to be simpler updated intro and conclusion to clarify claims around multi-interaction environments Added Appendix I.4 Added Figures 6-9 to address qualitative illustrations of the HIntS skills (since original submission): Figure 1 updated with a skill chain. Figure 2 has been broken into Figure 2,3 Added suggested alterations with red text.
Assigned Action Editor: ~Branislav_Kveton1
Submission Number: 1228
Loading