From Connectivity to Rewards: Dense Reward Learning with Directed State Graphs

TMLR Paper8632 Authors

27 Apr 2026 (modified: 26 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The integration of graphs with Goal-Conditioned Hierarchical Reinforcement Learning (GCHRL) has received increasing attention, as graphs naturally encode task hierarchies for effective subgoal sampling. However, existing methods often overlook intrinsic connectivity information, failing to fully leverage the underlying topology for efficient learning. Most graph-based GCHRL methods use the graph as a stochastic sampling tool rather than as an environmental model that encodes connectivity and state-accessibility information. This limitation is particularly acute in quasimetric environments, where the inherent asymmetry of state transitions poses a fundamental challenge to stable policy learning and robust path planning. In this paper, we address these problems by introducing a state connectivity model designed to predict pairwise state connectivity strength in asymmetric environments. We transform these connectivity strengths into scalar auxiliary dense rewards, providing continuous guidance across multiple hierarchical levels. We demonstrate that our proposed framework, Graph-Guided Quasimetric Dense Reward (G2QDR), can be integrated into any existing GCHRL architecture to boost performance, and the state connectivity model is efficiently implemented via a neural network trained on a directed state graph generated during exploration. Empirical results across a wide range of sparse reward environments indicate that G2QDR significantly enhances the performance of baseline GCHRL approaches with minimal computational overhead.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Erin_J_Talvitie1
Submission Number: 8632
Loading