Abstract: Although recent research has made some progress in deep reinforcement learning based on raw pixels, the low sample efficiency remains a key challenge in this field. Existing solutions often focus solely on extracting more effective state representations in the representation learning stage and overlook how to better utilize these state representations in the policy learning stage. To address this, a simple and sample-efficient visual reinforcement learning method based on multiview optimization aggregation (MVOA-VRL) is proposed for pixel-based off-policy reinforcement learning frameworks. This method enables the agent to concurrently focus on learning and utilizing state representations. Specifically, MVOA-VRL acquires multiple views of samples through random crop and adaptive intensity adjustment. It then introduces optimization aggregation methods separately in the representation learning and reinforcement learning modules to aggregate the similarities, actions, and state values of multiple samples from different views. MVOA-VRL aims to promote the agent's learning of effective representations and stable policies. Experimental results on continuous control tasks in the DMControl environment show that, compared with state-of-the-art methods, MVOA-VRL achieves higher scores and significantly improves sample efficiency.
Loading