Learning Interpretable Options by Identifying Reward Diffusion Bottlenecks in Reinforcement Learning

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: hierarchical reinforcement learning, value power strength, option discovery, bottleneck identification
TL;DR: Value Power Strength (VPS) leverages reward diffusion from random-reward value functions to identify bottlenecks in discrete/continuous spaces, enabling option discovery with meaningful subgoals and improved exploration.
Abstract: Bottleneck states, which connect distinct regions of the state space, provide a principled and interpretable basis for constructing temporal abstractions in Hierarchical Reinforcement Learning (HRL). However, existing bottleneck identification methods primarily rely on topological analysis of the state-transition graph, limiting their scalability to high-dimensional or continuous domains. To address this challenge, we introduce Value Power Strength (VPS), a value function-based metric inspired by the analogy between the Bellman equation and Kirchhoff’s current law, to quantify bottleneck property via the diffusion of reward in Markov Decision Processes (MDPs). VPS is estimated efficiently using value functions learned from random reward signals and captures reward diffusion bottlenecks in both discrete and continuous state spaces. Leveraging VPS, we design options that guide agents toward or away from bottleneck regions. Experimental results on classic tabular domains, visual GridWorld, and Atari 2600 games demonstrate that the VPS-based framework discovers semantically meaningful subgoals and substantially improves exploration efficiency.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 10132
Loading