Causality-driven Hierarchical Structure Discovery for Reinforcement LearningDownload PDF

Published: 31 Oct 2022, Last Modified: 27 Dec 2022NeurIPS 2022 AcceptReaders: Everyone
Keywords: hierarchical reinforcement learning, causal discovery, causalty, subgoal
Abstract: Hierarchical reinforcement learning (HRL) has been proven to be effective for tasks with sparse rewards, for it can improve the agent's exploration efficiency by discovering high-quality hierarchical structures (e.g., subgoals or options). However, automatically discovering high-quality hierarchical structures is still a great challenge. Previous HRL methods can only find the hierarchical structures in simple environments, as they are mainly achieved through the randomness of agent's policies during exploration. In complicated environments, such a randomness-driven exploration paradigm can hardly discover high-quality hierarchical structures because of the low exploration efficiency. In this paper, we propose CDHRL, a causality-driven hierarchical reinforcement learning framework, to build high-quality hierarchical structures efficiently in complicated environments. The key insight is that the causalities among environment variables are naturally fit for modeling reachable subgoals and their dependencies; thus, the causality is suitable to be the guidance in building high-quality hierarchical structures. Roughly, we build the hierarchy of subgoals based on causality autonomously, and utilize the subgoal-based policies to unfold further causality efficiently. Therefore, CDHRL leverages a causality-driven discovery instead of a randomness-driven exploration for high-quality hierarchical structure construction. The results in two complex environments, 2D-Minecraft and Eden, show that CDHRL can discover high-quality hierarchical structures and significantly enhance exploration efficiency.
TL;DR: We propose a Causality-Driven Hierarchical Reinforcement Learning (CDHRL) framework, which leverages the causality in the environment as the guidance to discover the high-quality subgoal hierarchy.
Supplementary Material: zip
18 Replies

Loading