Imagine Within Practice: Conservative Rollout Length Adaptation for Model-Based Reinforcement Learning

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: reinforcement learning, model-based reinforcement learning, adaptation, conservative imagination
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Model-based reinforcement learning (MBRL) algorithms achieve high sample efficiency by leveraging imagined rollouts from a world model for policy optimization. A crucial hyperparameter in MBRL is the rollout length, which represents a trade-off between data quality and efficiency by limiting the imaginary horizon. While longer rollout length offers enhanced efficiency, it introduces more unrealistic data due to compounding error, potentially leading to catastrophic performance deterioration. To prevent significant deviations between imagined rollouts and real transitions, most model-based methods manually tune a fixed rollout length for the entire training process. However, the fixed rollout length is not optimal for all rollouts and does not effectively prevent the generation of unrealistic data. To tackle this problem, we propose a novel method called Conservative Rollout Length Adaptation (CRLA), which conservatively restricts the agent from selecting actions that are rarely taken in the current state. CRLA truncates the rollout to preserve safety when there is a high probability of selecting infrequently taken actions. We apply our method to DreamerV3 and evaluate it on the Atari 100k benchmark. The results demonstrate that CRLA can effectively balance data quality and efficiency by adjusting rollout length and achieve significant performance gains in most Atari games compared to DreamerV3 in the default setting.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 7484
Loading