Mastering Visual Reinforcement Learning via Positive Unlabeled Policy-Guided Contrast

Published: 2025, Last Modified: 05 Jan 2026ICIC (12) 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Reinforcement learning (RL) has garnered considerable attention in recent years. A fundamental yet challenging problem in this paradigm is the effective perception of high-dimensional environmental information, leading to the emergence of Visual Reinforcement Learning (VRL). This sub-field focuses on learning representations from pixel observations to optimize decision-making policies. In this article, we provide a comprehensive analysis of existing benchmark frameworks and highlight a persistent paradox that challenges current approaches: during different training phases, the exploration of visual semantic information can both enhance and hinder the quality of learned representations. Furthermore, we reveal that the issue of over-redundancy commonly limits the sample efficiency of baseline methods. To address these limitations, we propose a novel plug-and-play approach for VRL. Our method employs a positive unlabeled policy-guided contrastive learning framework to jointly capture anti-redundant and policy-relevant pixel semantics during training. We validate the effectiveness of our approach through extensive benchmarking experiments, demonstrating its superior performance over existing methods in pixel environments.
Loading