Offline Model-based Adaptable Policy Learning

Xiong-Hui Chen; Yang Yu; Qingyang Li; Fan-Ming Luo; Zhiwei Tony Qin; Shang Wenjie; Jieping Ye

Offline Model-based Adaptable Policy Learning

Xiong-Hui Chen, Yang Yu, Qingyang Li, Fan-Ming Luo, Zhiwei Tony Qin, Shang Wenjie, Jieping Ye

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 PosterReaders: Everyone

Keywords: Reinforcement Learning

TL;DR: We investigate the decision-making problems in out-of-support regions directly in offline model-based learning setting. As a result, we propose a new paradigm of offline model-based learning through learning to adapt

Abstract: In reinforcement learning, a promising direction to avoid online trial-and-error costs is learning from an offline dataset. Current offline reinforcement learning methods commonly learn in the policy space constrained to in-support regions by the offline dataset, in order to ensure the robustness of the outcome policies. Such constraints, however, also limit the potential of the outcome policies. In this paper, to release the potential of offline policy learning, we investigate the decision-making problems in out-of-support regions directly and propose offline Model-based Adaptable Policy LEarning (MAPLE). By this approach, instead of learning in in-support regions, we learn an adaptable policy that can adapt its behavior in out-of-support regions when deployed. We conduct experiments on MuJoCo controlling tasks with offline datasets. The results show that the proposed method can make robust decisions in out-of-support regions and achieve better performance than SOTA algorithms.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: pdf

Code: https://github.com/xionghuichen/MAPLE

20 Replies

Loading