On Gap-dependent Bounds for Offline Reinforcement LearningDownload PDF

Published: 31 Oct 2022, 18:00, Last Modified: 12 Jan 2023, 06:26NeurIPS 2022 AcceptReaders: Everyone
Keywords: offline reinforcement learning, gap-dependent
TL;DR: A systematic study on gap-dependent bounds for tabular reinforcement learning with different assumptions and both upper and lower bounds.
Abstract: This paper presents a systematic study on gap-dependent sample complexity in offline reinforcement learning. Prior works showed when the density ratio between an optimal policy and the behavior policy is upper bounded (single policy coverage), then the agent can achieve an $O\left(\frac{1}{\epsilon^2}\right)$ rate, which is also minimax optimal. We show under the same single policy coverage assumption, the rate can be improved to $O\left(\frac{1}{\epsilon}\right)$ when there is a gap in the optimal $Q$-function. Furthermore, we show under a stronger uniform single policy coverage assumption, the sample complexity can be further improved to $O(1)$. Lastly, we also present nearly-matching lower bounds to complement our gap-dependent upper bounds.
Supplementary Material: pdf
15 Replies