ReaPER: Improving Sample Efficiency in Model-Based Latent ImaginationDownload PDF

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Blind SubmissionReaders: Everyone
Keywords: model-based reinforcement learning, visual control, sample efficiency
Abstract: Deep Reinforcement Learning (DRL) can distill behavioural policies from sensory input that solve complex tasks, however, the policies tend to be task-specific and sample inefficient, requiring a large number of interactions with the environment that may be costly or impractical for many real world applications. Model-based DRL (MBRL) can allow learned behaviours and dynamics from one task to be translated to a new task in a related environment, but still suffer from low sample efficiency. In this work we introduce ReaPER, an algorithm that addresses the sample efficiency challenge in model-based DRL, we illustrate the power of the proposed solution on the DeepMind Control benchmark. Our improvements are driven by sparse , self-supervised, contrastive model representations and efficient use of past experience. We empirically analyze each novel component of ReaPER and analyze how they contribute to sample efficiency. We also illustrate how other standard alternatives fail to improve upon previous methods. Code will be made available.
One-sentence Summary: We introduce ReaPER, an algorithm that addresses the sample efficiency challenge in model-based DRL, we illustrate the power of the proposed solution on the DeepMind Control benchmark.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Supplementary Material: zip
Reviewed Version (pdf): https://openreview.net/references/pdf?id=AwrXdmSQi
8 Replies

Loading