A UCB-type of Approach for Nonstationary MDPs with General Function Approximation

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: nonstationary MDP, general function approximation, eluder dimension
TL;DR: This work proposes UCB-type of algorithm for non-stationary MDPs with general function approximation.
Abstract: Function approximation has experienced significant success in the field of reinforcement learning (RL). Despite a handful of progress on developing theory for Nonstationary RL with function approximation under structural assumptions, existing work for nonstationary RL with general function approximation \citet{Feng:nonstationary:ICML:2023} studied the confidence-set based algorithm relying on an oracle to select the optimistic state-action value function within the confidence set, which is computationally inefficient. To mitigate the drawback of confidence-set based algorithm, in this work, we propose a popular UCB-type of algorithm for nonstationary RL with general function approximation. Our algorithm features the restart mechanism and a new design of bonus term to handle nonstationarity. We then establish a dynamic regret upper bound for the proposed algorithm, and demonstrate the dynamic regret bound for the examples of nonstationary tabular MDPs and nonstationary linear MDPs. To the best of our knowledge, this is the first UCB-type of algorithm for non-stationary RL with general function approximation. Our theory contributes to the recent progress on RL with general function approximation.
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6001
Loading