Fast Rates for the Regret of Offline Reinforcement LearningDownload PDFOpen Website

2021 (modified: 24 Feb 2022)COLT 2021Readers: Everyone
Abstract: We study the regret of reinforcement learning from offline data generated by a fixed behavior policy in an infinite-horizon discounted Markov decision process (MDP). While existing analyses of comm...
0 Replies

Loading