Collaborative World Models: An Online-Offline Transfer RL Approach

16 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: World models, reinforcement learning, visual control, transfer learning
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: Training offline reinforcement learning (RL) models with visual inputs is challenging due to the coupling of overfitting issue in representation learning and the risk of overestimating true value functions. Recent work has attempted to alleviate the overestimation bias by encouraging conservative behaviors beyond the scope of the offline dataset. This paper, in contrast, tries to build flexible constraints for the offline policies without impeding the exploration of potential advantages. The key idea is to leverage an off-the-shelf RL simulator, with which can be easily interacted in an online manner. In this auxiliary domain, we perform an actor-critic algorithm whose value model is aligned to the target data and thus serves as a “$\textit{test bed}$” for the offline policies. In this way, the online simulator can be used as the $\textit{playground}$ for the offline agent, allowing for mildly-conservative value estimation. Experimental results demonstrate the remarkable effectiveness of our approach in challenging environments such as DeepMind Control, Meta-World, and RoboDesk. It outperforms existing offline visual RL approaches by substantial margins.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
Supplementary Material: zip
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 537
Loading