On the Role of General Function Approximation in Offline Reinforcement Learning

Chenjie Mao; Qiaosheng Zhang; Zhen Wang; Xuelong Li

On the Role of General Function Approximation in Offline Reinforcement Learning

Chenjie Mao, Qiaosheng Zhang, Zhen Wang, Xuelong Li

Published: 16 Jan 2024, Last Modified: 13 Apr 2024ICLR 2024 spotlightEveryoneRevisionsBibTeX

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: reinforcement learning theory, offline reinforcement learning, general function approximation, learnability, minimax lower bounds

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

TL;DR: This paper clarifies the use of general function approximation in offline RL and proposes lower bounds for learnability.

Abstract: We study offline reinforcement learning (RL) with general function approximation. General function approximation is a powerful tool for algorithm design and analysis, but its adaptation to offline RL encounters several challenges due to varying approximation targets and assumptions that blur the real meanings of function assumptions. In this paper, we try to formulate and clarify the treatment of general function approximation in offline RL in two aspects: (1) analyzing different types of assumptions and their practical usage, and (2) understanding its role as a restriction on underlying MDPs from information-theoretic perspectives. Additionally, we introduce a new insight for lower bound establishing: one can exploit model-realizability to establish general-purpose lower bounds that can be generalized into other functions. Building upon this insight, we propose two generic lower bounds that contribute to a better understanding of offline RL with general function approximation.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Primary Area: reinforcement learning

Submission Number: 7145

Loading