Improved Sample Complexity for Reward-free Reinforcement Learning under Low-rank MDPs

Yuan Cheng; Ruiquan Huang; Yingbin Liang; Jing Yang

Improved Sample Complexity for Reward-free Reinforcement Learning under Low-rank MDPs

Yuan Cheng, Ruiquan Huang, Yingbin Liang, Jing Yang

Published: 01 Feb 2023, Last Modified: 27 Feb 2023ICLR 2023 posterReaders: Everyone

Keywords: Reward Free Exploration, Representation Learning, Sample Complexity, Model-Based RL

TL;DR: We propose a novel reward free reinforcement learning algorithm under low-rank MDPs, which improves the sample complexity of previous work. We also provide a lower bound. Finally we study representation learning via reward free reinforement learning.

Abstract: In reward-free reinforcement learning (RL), an agent explores the environment first without any reward information, in order to achieve certain learning goals afterwards for any given reward. In this paper we focus on reward-free RL under low-rank MDP models, in which both the representation and linear weight vectors are unknown. Although various algorithms have been proposed for reward-free low-rank MDPs, the corresponding sample complexity is still far from being satisfactory. In this work, we first provide the first known sample complexity lower bound that holds for any algorithm under low-rank MDPs. This lower bound implies it is strictly harder to find a near-optimal policy under low-rank MDPs than under linear MDPs. We then propose a novel model-based algorithm, coined RAFFLE, and show it can both find an $\epsilon$-optimal policy and achieve an $\epsilon$-accurate system identification via reward-free exploration, with a sample complexity significantly improving the previous results. Such a sample complexity matches our lower bound in the dependence on $\epsilon$, as well as on $K$ {in the large $d$ regime}, where $d$ and $K$ respectively denote the representation dimension and action space cardinality. Finally, we provide a planning algorithm (without further interaction with true environment) for RAFFLE to learn a near-accurate representation, which is the first known representation learning guarantee under the same setting.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Reinforcement Learning (eg, decision and control, planning, hierarchical RL, robotics)

16 Replies

Loading