SLA-v3: Spatial Linkability-Aware and Novelty-Encouraging State Heuristic for Exploration

17 Sept 2025 (modified: 15 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: RL, Exploration, Intrinsic Motivation, Sparse Reward
TL;DR: We propose SLA-v3, a novel intrinsic motivation method quantifying state traversal difficulty against the Detachment-Derailment problem
Abstract: Efficient exploration continues to be a pivotal challenge in reinforcement learning (RL), particularly in environments characterized by sparse rewards. While intrinsic motivation (IM) has proven effective for tackling hard exploration tasks, current IM approaches often struggle with the detachment-derailment (D-D) problem. This issue significantly curtails their effectiveness, especially in settings with extremely sparse rewards. Although methods like Go-Explore address D-D by explicitly archiving states to ensure revisitation, their dependency on state restoration limits their practical application in procedurally generated environments. In this paper, we argue that the root cause of the D-D problem lies in the underlying topological transition structure of the environment. Specifically, we observe that certain states become persistently difficult to traverse and revisit reliably when subjected to exploratory noise. To overcome this, we introduce a novel IM framework centered on state traversal difficulty. Within this framework, we propose the $\textbf{S}$patial $\textbf{L}$inkability-$\textbf{A}$ware $\textbf{a}$nd $\textbf{N}$ovelty-$\textbf{E}$ncouraging $\textbf{S}$tate $\textbf{H}$euristic ($\textbf{SLAANESH}$), abbreviated as $\textbf{SLA-v3}$. SLA-v3 tackles the D-D problem by utilizing the shortest-path quasi-metric from the initial state ($S_0$) as a heuristic for traversal difficulty. This mechanism generates sustainable exploratory incentives, particularly encouraging visit to hard-to-traverse states. Furthermore, SLA-v3 integrates a novelty detector, which serves to warm up the heuristic and effectively prevent stagnation in unproductive dead-end paths. Our extensive experimental evaluations on MiniGrid and challenging Atari environments (PitFall! and Montezuma's Revenge) robustly demonstrate the superior efficacy of SLA-v3.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 9129
Loading