2013 (modified: 11 Nov 2022)ICML (1) 2013Readers: Everyone
Abstract:We consider an agent interacting with an environment in a single stream of actions, observations, and rewards, with no reset. This process is not assumed to be a Markov Decision Process (MDP). Rath...