Keywords: Goal-conditioned RL, Emergent exploration, Contrastive RL, Cognitive Interpretability
Abstract: In this work, we take a first step toward uncovering the underlying dynamics of emergent exploration in unsupervised reinforcement learning. We study Single-Goal Contrastive RL (SGCRL), which is capable of solving challenging robotic manipulation tasks without external rewards or curricula. Drawing on methods from cognitive science, we combine theoretical analysis of the algorithm's objective function with controlled experiments to improve understanding of its behavioral drivers. We show that SGCRL implicitly maximizes rewards shaped by its learned representations. The contrastive representations adapt the reward landscape to promote exploration prior to reaching the goal and exploitation thereafter. We also build a simple model of the algorithm without function approximation, isolating the essential components responsible for its exploratory behavior. Finally, we establish connections between SGCRL's exploration dynamics and classical exploration methods, including R-MAX and PSRL.
Submission Number: 39
Loading