PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for Reinforcement LearningDownload PDF

May 21, 2021 (edited Jan 12, 2022)NeurIPS 2021 PosterReaders: Everyone
  • Keywords: Reinforcement learning, data efficiency, representation learning, augmentation, virtual trajectory, cycle consistency
  • TL;DR: We demonstrate that augmenting cycle-consistent virtual trajectories can significantly improve RL data efficiency.
  • Abstract: Learning good feature representations is important for deep reinforcement learning (RL). However, with limited experience, RL often suffers from data inefficiency for training. For un-experienced or less-experienced trajectories (i.e., state-action sequences), the lack of data limits the use of them for better feature learning. In this work, we propose a novel method, dubbed PlayVirtual, which augments cycle-consistent virtual trajectories to enhance the data efficiency for RL feature representation learning. Specifically, PlayVirtual predicts future states in a latent space based on the current state and action by a dynamics model and then predicts the previous states by a backward dynamics model, which forms a trajectory cycle. Based on this, we augment the actions to generate a large amount of virtual state-action trajectories. Being free of groudtruth state supervision, we enforce a trajectory to meet the cycle consistency constraint, which can significantly enhance the data efficiency. We validate the effectiveness of our designs on the Atari and DeepMind Control Suite benchmarks. Our method achieves the state-of-the-art performance on both benchmarks. Our code is available at https://github.com/microsoft/Playvirtual.
  • Supplementary Material: pdf
  • Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.
  • Code: https://github.com/microsoft/Playvirtual
11 Replies

Loading