Finite time bounds for sampling based fitted value iteration

Csaba Szepesvári, Rémi Munos

2005 (modified: 11 Nov 2022)ICML 2005Readers: Everyone

Abstract: In this paper we consider sampling based fitted value iteration for discounted, large (possibly infinite) state space, finite action Markovian Decision Problems where only a generative model of the transition probabilities and rewards is available. At each step the image of the current estimate of the optimal value function under a Monte-Carlo approximation to the Bellman-operator is projected onto some function space. PAC-style bounds on the weighted Lp-norm approximation error are obtained as a function of the covering number and the approximation power of the function space, the iteration number and the sample size.

0 Replies