Learning Diverse Sub-Policies via a Task-Agnostic Regularization on Action Distributions

Liangyu Huo, Zulin Wang, Mai Xu, Yuhang Song

2020 (modified: 03 Nov 2022)ICASSP 2020Readers: Everyone

Abstract: Automatic sub-policy discovery has recently received much attention in hierarchical reinforcement learning (HRL). The conventional approaches to learning sub-policies suffer from collapsing into just one sub-policy dominating the whole task, lacking techniques to ensure the diversity of different subpolicies. In this paper, we formulate the discovery of diverse sub-policies as a trajectory inference. Then, we propose an information-theoretic objective based on action distributions to encourage diversity. Moreover, two simplifications are derived on discrete and continuous action space for reducing the computation. Finally, the experimental results show that the proposed approach can further improve the state-of-theart approaches without modifying existing hyperparameters on two different HRL domains, suggesting the wide applicability and robustness of our approach.

0 Replies