Temporal prediction model with context-aware data augmentation for robust visual reinforcement learning

Published: 01 Jan 2024, Last Modified: 13 Nov 2024Neural Comput. Appl. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: While reinforcement learning has shown promising abilities to solve continuous control tasks from visual inputs, it remains a challenge to learn robust representations from high-dimensional observations and generalize to unseen environments with distracting elements. Recently, strong data augmentation has been applied to increase the diversity of the training data, but it may damage the task-relevant pixels and thus hinder the optimization of reinforcement learning. To this end, this paper proposes temporal prediction model with context-aware data augmentation (TPMC), a framework which incorporates context-aware strong augmentation into the dynamic model for learning robust policies. Specifically, TPMC utilizes the gradient-based saliency map to identify and preserve task-relevant pixels during strong augmentation, generating reliable augmented images for stable training. Moreover, the temporal prediction consistency between strong and weak augmented views is enforced to construct a contrastive objective for learning shared task-relevant representations. Extensive experiments are conducted to evaluate the performance on DMControl-GB benchmarks and several robotic manipulation tasks. Experimental results demonstrate that TPMC achieves superior data-efficiency and generalization to other state-of-the-art methods.
Loading