Keywords: reinforcement learning, exploration, intrinsic motivation, surprise, empowerment, contrastive learning
TL;DR: Temporal Contrastive learning can be used to derive exploration
Abstract: Exploration remains a key challenge in reinforcement learning (RL), especially in long-horizon tasks and environments with high-dimensional observations. A common strategy for effective exploration is to promote state coverage or novelty, which often involves estimating the agent's state visitation distribution. In this paper, we propose \textbf{C}uriosity-Driven Exploration via \textbf{Te}mporal \textbf{C}ontrastive Learning (\methodName), an exploration method based on temporal contrastive learning that rewards agents for reaching states with unexpected futures. This incentivizes uncovering meaningful less-visited states. \methodName is simple and does not require explicit density or uncertainty estimation, while learning representations aligned with the RL objective. It consistently outperforms standard baselines in complex mazes using different embodiments (Ant and Humanoid) and robotic manipulation tasks, while also yielding more diverse behaviors in Craftax without requiring task-specific information.
Submission Number: 26
Loading