Hierarchical Reinforcement Learning with Targeted Causal Interventions

Mohammadsadegh Khorasani; Saber Salehkaleybar; Negar Kiyavash; Matthias Grossglauser

Hierarchical Reinforcement Learning with Targeted Causal Interventions

Mohammadsadegh Khorasani, Saber Salehkaleybar, Negar Kiyavash, Matthias Grossglauser

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Hierarchical reinforcement learning (HRL) improves the efficiency of long-horizon reinforcement-learning tasks with sparse rewards by decomposing the task into a hierarchy of subgoals. The main challenge of HRL is efficient discovery of the hierarchical structure among subgoals and utilizing this structure to achieve the final goal. We address this challenge by modeling the subgoal structure as a causal graph and propose a causal discovery algorithm to learn it. Additionally, rather than intervening on the subgoals at random during exploration, we harness the discovered causal model to prioritize subgoal interventions based on their importance in attaining the final goal. These targeted interventions result in a significantly more efficient policy in terms of the training cost. Unlike previous work on causal HRL, which lacked theoretical analysis, we provide a formal analysis of the problem. Specifically, for tree structures and, for a variant of Erdős-Rényi random graphs, our approach results in remarkable improvements. Our experimental results on HRL tasks also illustrate that our proposed framework outperforms existing work in terms of training cost.

Lay Summary: Many real-world tasks, such as building a tool in Minecraft, require achieving several intermediate milestones, like collecting wood or crafting a pickaxe, before any reward is obtained. This makes it difficult for agents to determine which milestones are important. We investigated whether viewing these milestones as a chain of causes and effects could help. In our approach, each milestone is represented as a node in a causal graph, allowing us to uncover the relationships between them. Through targeted experiments, the agent learns which milestones enable progress toward others. Armed with this knowledge, the agent can then focus its efforts on the milestones that have the greatest impact on the final reward, rather than exploring all milestones at random. In both simple tests and a challenging environment (Minecraft), our method enabled the agent to reach its goal faster than leading alternatives. We also provide theoretical analysis and guarantees that explain how and why our approach can lead to improvements. This approach could make it easier for agents to solve complex, multi-step tasks, both in games and in real-world scenarios.

Link To Code: https://github.com/sadegh16/HRC

Primary Area: General Machine Learning->Causality

Keywords: Structural Causal Models, Causal Discovery, Hierarchical Reinforcement Learning, Intervention, Subgoal Discovery

Submission Number: 11013

Loading