MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning

Mao Hong; Zhiyue Zhang; Yue Wu; Yanxun Xu

MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning

Mao Hong, Zhiyue Zhang, Yue Wu, Yanxun Xu

23 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: Offline Reinforcement Learning, Mirror Ascent, Model-based, PAC Guarantee

TL;DR: We propose a practically implementable model-based mirror ascent algorithm for offline RL with theoretical guarantees.

Abstract: Model-based offline reinforcement learning methods (RL) have achieved state-of-the-art performance in many decision-making problems thanks to their sample efficiency and generalizability. However, prior model-based offline RL methods in the literature either demonstrate their successes solely through empirical studies, or provide algorithms that have theoretical guarantees but are hard to implement in practice. To date, a practically implementable algorithm for model-based offline RL with theoretical guarantees is still lacking. To fill this gap, we develop MoMA, a model-based mirror ascent algorithm with general function approximations under partial coverage of offline data. Iteratively, MoMA conservatively estimates the value function by a minimization procedure within a confidence set of the transition model in the policy evaluation step, then updates the policy with general function approximations instead of commonly-used parametric policy classes in the policy improvement step. Under some mild assumptions, we establish theoretical guarantees of the proposed algorithm by proving an upper bound on the suboptimality of the returned policy. The effectiveness of the proposed algorithm is demonstrated via numerical studies.

Primary Area: reinforcement learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7620

Loading