A PERSPECTIVE OF IMPROPER DYNAMICS ON OFFLINE MODEL-BASED PLANNING

18 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Offline RL, model-based RL, deep RL, planning
TL;DR: We propose a method aimed at addressing the issues in offline model-based planning that arise due to an inaccurate dynamics model.
Abstract: By learning the dynamics model, estimating, and planning on the latent state, MuZero and its variants perform well in complex environments. However, the performance of these algorithms require an accurate dynamics model and prediction model, which may be difficult in offline reinforcement learning since the lack of interactions with the environment. Recent works attempt to use one-step rollouts to reduce the cumulative error of rollout caused by an inaccurate dynamics model. We argue that the planning issues of MuZero-type methods are mainly caused by inaccurate models. To address this issue, we propose a robust method, Constrained Offline Model-based Planning (COMP), for training dynamics or prediction models more smoothly. COMP introduces a kind of specifically designed noise to the latent state, aiming to align the value and dynamics of these states with those of states not perturbed. Our method can be combined with MuZero and its derived algorithms to improve planning performance in offline settings. Experiments show that our proposed method achieved notable performance in most Atari game tasks on RL Unplugged benchmark.
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1225
Loading