Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy

Xiyao Wang; Wichayaporn Wongkamjan; Ruonan Jia; Furong Huang

Live in the Moment: Learning Dynamics Model Adapted to Evolving Policy

Xiyao Wang, Wichayaporn Wongkamjan, Ruonan Jia, Furong Huang

22 Sept 2022 (modified: 27 Apr 2025)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Reinforcement Learning, Model-based Reinforcement Learning, State-action Visitation Distribution, Distribution Shift, Policy-adapted Dynamics Model Learning

TL;DR: We theoretically analyze how the distribution of historical policies affects the model learning and model rollouts and propose a novel model learning method for model-based RL.

Abstract: Model-based reinforcement learning (RL) often achieves higher sample efficiency in practice than model-free RL by learning a dynamics model to generate samples for policy learning. Previous works learn a dynamics model that fits under the empirical state-action visitation distribution for all historical policies, i.e., the sample replay buffer. However, in this paper, we observe that fitting the dynamics model under the distribution for \emph{all historical policies} does not necessarily benefit model prediction for the \emph{current policy} since the policy in use is constantly evolving over time. The evolving policy during training will cause state-action visitation distribution shifts. We theoretically analyze how this distribution shift over historical policies affects the model learning and model rollouts. We then propose a novel dynamics model learning method, named \textit{Policy-adapted Dynamics Model Learning (PDML)}. PDML dynamically adjusts the historical policy mixture distribution to ensure the learned model can continually adapt to the state-action visitation distribution of the evolving policy. Experiments on a range of continuous control environments in MuJoCo show that PDML achieves significant improvement in sample efficiency and higher asymptotic performance combined with the state-of-the-art model-based RL methods.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 1 code implementation](https://www.catalyzex.com/paper/live-in-the-moment-learning-dynamics-model/code)

5 Replies

Loading