An Approximately Optimal Relative Value Learning Algorithm for Averaged MDPs with Continuous States and ActionsDownload PDFOpen Website

Published: 2019, Last Modified: 17 May 2023Allerton 2019Readers: Everyone
Abstract: It has long been a challenging problem to design algorithms for Markov decision processes (MDPs) with continuous states and actions that are provably approximately optimal and can provide arbitrarily good approximation for any MDP. In this paper, we propose an empirical value learning algorithm for average MDPs with continuous states and actions that combines empirical value iteration with n function-parametric approximation and approximation of transition probability distribution with kernel density estimation. We view each iteration as operation of random operator and argue convergence using the probabilistic contraction analysis method that the authors (along with others) have recently developed.
0 Replies

Loading