Hierarchical Contrastive Reinforcement Learning: learn representation more suitable for RL environments

Haomin Li; Yiran Xu; Haoyu Pan; Wei Tong; Guangyu Zhu; Edmond Qi Wu

Hierarchical Contrastive Reinforcement Learning: learn representation more suitable for RL environments

Haomin Li, Yiran Xu, Haoyu Pan, Wei Tong, Guangyu Zhu, Edmond Qi Wu

04 Sept 2025 (modified: 26 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: reinforcement learning, contrastive learning, actor-critic, representation learning

Abstract: Goal-conditioned reinforcement learning holds significant importance for real-world environment, but its inherent sparse reward structure brings challenges. In recent years, some researchers have attempted to learn better representations to solve RL tasks, among which the contrastive reinforcement learning that is independent of rewards demonstrating strong performance. However, how to make contrastive learning better applicable to reinforcement learning settings is still a problem. In this paper, instead of learning representations related to goals in environments, we present hierarchical contrastive reinforcement learning (HCRL) method that reduces the difficulty of learn goal representations by introducing intermediate representations related to states. HCRL expresses such an idea that the agent should first understand the environment and then understand the tasks in the environment. Our method fully utilizes the information in the GCRL setting and provides a better representation learning method, which has better performance in various complex and sparse environments without adding additional assumptions or constraints. The results of comparison experiments show that HCRL has faster convergence speed and higher success rate than prior work on a range of goal-conditional RL tasks. We further conduct ablation studies and additional evaluations to validate our method. Anonymous code: https://anonymous.4open.science/r/HCRL-6E88.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 2141

Loading