Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks Download PDF

29 Sept 2021 (modified: 13 Feb 2023)ICLR 2022 Conference Withdrawn SubmissionReaders: Everyone
Keywords: offline reinforcement learning, deep ReLU networks, function approximation
Abstract: Offline reinforcement learning (RL) leverages previously collected data for policy optimization without any further active exploration. Despite the recent interest in this problem, its theoretical foundations in neural network function approximation settings remain limited. In this paper, we study the statistical theory of offline RL with deep ReLU network function approximation. In particular, we establish the sample complexity of $\tilde{\mathcal{O}}\left( \kappa^{1 + d/\alpha} \cdot \epsilon^{-2 - 2d/\alpha} \right)$ for offline RL with deep ReLU networks, where $\kappa$ is a measure of distributional shift, $d$ is the dimension of the state-action space, $\alpha$ is a (possibly fractional) smoothness parameter of the underlying Markov decision process (MDP), and $\epsilon$ is a desired error. Notably, our sample complexity holds under two novel considerations, namely the Besov dynamic closure and the correlated structure that arises from value regression for offline RL. While the Besov dynamic closure generalizes the dynamic conditions for offline RL in the prior works, the correlated structure renders the existing analyses improper or inefficient. To our knowledge, our work is the first to provide such a comprehensive analysis for offline RL with deep ReLU network function approximation.
One-sentence Summary: The first comprehensive analysis of offline RL with deep ReLU network function approximation under a general dynamic condition and correlated structure
5 Replies

Loading