Keywords: Unsupervised Dynamics Generalization, Model-Based Reinforcement Learning
Abstract: By incorporating the environment-specific factor into the dynamics prediction, model-based reinforcement learning (MBRL) is able to generalise to environments with diverse dynamics.In the majority of real-world scenarios, the environment-specific factor is not observable, so existing methods attempt to estimate it from historical transition segments. Nevertheless,earlier research was unable to identify distinct clusters for environment-specific factors learned from different environments, resulting in poor performance. To address this issue, We introduce a set of environmental prototypes to represent the environmental-specified representation for each environment. By encouraging learned environment-specific factors to resemble their assigned environmental prototypes more closely, the discrimination between factors estimated from distinct environments will be enhanced. To learn such prototypes, we first construct prototypes for each sampled trajectory and then hierarchically combine trajectory prototypes with similar semantics into one environmental prototype. Experiments demonstrate that environment-specific factors estimated by our method have superior clustering performance and can consistently improve MBRL's generalisation performance in six environments consistently.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)