On the Data-Efficiency with Contrastive Image Transformation in Reinforcement LearningDownload PDF


22 Sept 2022, 12:35 (modified: 19 Nov 2022, 11:07)ICLR 2023 Conference Blind SubmissionReaders: Everyone
Keywords: Reinforcement Learning, Data Augmentation, Self-Supervised Learning, Representation Learning
TL;DR: CoIT is a learnable image transformation for sample-efficiency improvement.
Abstract: Data-efficiency has always been an essential issue in pixel-based reinforcement learning (RL). As the agent not only learns the decision-making but also meaningful representations from images. The line of reinforcement learning with data augmentation shows significant improvements in sample-efficiency. However, it is challenging to guarantee the optimality invariant transformation, that is, the augmented data are readily recognized as a completely different state by the agent. In the end, we propose a contrastive invariant transformation (CoIT), a simple yet promising learnable data augmentation combined with standard model-free algorithms to improve sample-efficiency. Concretely, the differentiable CoIT leverages original samples with augmented samples and hastens the state encoder for a contrastive invariant embedding. We evaluate our approach on DeepMind Control Suite and Atari100K. Empirical results verify advances using CoIT, enabling it to outperform the new state-of-the-art on various tasks. Source code is available at https://github.com/Kamituna/CoIT.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Supplementary Material: zip
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)
11 Replies