A UCB-type of Approach for Nonstationary MDPs with General Function Approximation

22 Sept 2023 (modified: 25 Mar 2024)
Keywords: nonstationary MDP, general function approximation, eluder dimension
TL;DR: This work proposes UCB-type of algorithm for non-stationary MDPs with general function approximation.
Abstract: Function approximation has experienced significant success in the field of reinforcement learning (RL). Despite a handful of progress on developing theory for Nonstationary RL with function approximation under structural assumptions, existing work for nonstationary RL with general function approximation \citet{Feng:nonstationary:ICML:2023} studied the confidence-set based algorithm relying on an oracle to select the optimistic state-action value function within the confidence set, which is computationally inefficient. To mitigate the drawback of confidence-set based algorithm, in this work, we propose a popular UCB-type of algorithm for nonstationary RL with general function approximation. Our algorithm features the restart mechanism and a new design of bonus term to handle nonstationarity. We then establish a dynamic regret upper bound for the proposed algorithm, and demonstrate the dynamic regret bound for the examples of nonstationary tabular MDPs and nonstationary linear MDPs. To the best of our knowledge, this is the first UCB-type of algorithm for non-stationary RL with general function approximation. Our theory contributes to the recent progress on RL with general function approximation.
