SelfDreamer: Dual-Prototypical Regularization for Frame-masked Model-based Reinforcement Learning

21 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: reinforcement learning, prototypical learning, deep learning
Abstract: In the realm of reinforcement learning (RL), the conventional approach involves training agents in unknown environments using extensive experiences comprising high-dimensional state representations (typically images), actions, and rewards. However, this standard setup imposes substantial data transmission overhead in scenarios where edge devices are employed for data collection, and cloud servers are utilized for model training. This paper introduces a novel paradigm termed ”frame-masked RL,” which is devised to enhance data efficiency while examining the impact on existing methods. Concurrently, we introduce a model-based algorithm, ”SelfDreamer,” tailored to mitigate the information loss incurred due to frame masking. SelfDreamer leverages action-transition dual prototypes to embed action information within the world model and align the hidden states in the representation space. Empirical evaluations reveal that SelfDreamer consistently outperforms state-of-the-art methods across six continuous control tasks sourced from the DeepMind Control Suite, demonstrating superior or comparable performance while utilizing only half of the observations from the environment.
Supplementary Material: zip
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3766
Loading