Optimal Regret Bounds for Selecting the State Representation in Reinforcement LearningDownload PDFOpen Website

2013 (modified: 11 Nov 2022)ICML (1) 2013Readers: Everyone
Abstract: We consider an agent interacting with an environment in a single stream of actions, observations, and rewards, with no reset. This process is not assumed to be a Markov Decision Process (MDP). Rath...
0 Replies

Loading