Optimal Regret Bounds for Selecting the State Representation in Reinforcement Learning

Odalric-Ambrym Maillard, Phuong Nguyen, Ronald Ortner, Daniil Ryabko

2013 (modified: 11 Nov 2022)ICML (1) 2013Readers: Everyone

Abstract: We consider an agent interacting with an environment in a single stream of actions, observations, and rewards, with no reset. This process is not assumed to be a Markov Decision Process (MDP). Rath...

0 Replies