Pessimistic Model-Based Actor-Critic for Offline Reinforcement Learning: Theory and Algorithms

Mao Hong; Yue Wu; Yanxun Xu

Pessimistic Model-Based Actor-Critic for Offline Reinforcement Learning: Theory and Algorithms

Mao Hong, Yue Wu, Yanxun Xu

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Actor-critic, Model-based offline RL, PAC guarantee, Pessimism

Abstract: Model-based offline reinforcement learning (RL) has achieved superior performance than model-free RL in many decision-making problems due to its sample efficiency and generalizability. However, prior model-based offline RL methods in the literature either demonstrate their successes only through empirical studies, or provide algorithms that have theoretical guarantees but are hard to implement in practice. To date, a general computationally-tractable algorithm for model-based offline RL with PAC guarantees is still lacking. To fill this gap, we develop a pessimistic model-based actor-critic (PeMACO) algorithm with general function approximations assuming partial coverage of the offline dataset. Specifically, the critic provides a pessimistic Q-function through incorporating uncertainties of the learned transition model, and the actor updates policies by employing approximations of the pessimistic Q-function. Under some mild assumptions, we establish theoretical PAC guarantees of the proposed PeMACO algorithm by proving upper bounds on the suboptimality of the returned policy by PeMACO.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)

10 Replies

Loading