A universal scheme for learning

Vivek F. Farias, Ciamac C. Moallemi, Benjamin Van Roy, Tsachy Weissman

Published: 2005, Last Modified: 12 May 2023ISIT 2005Readers: Everyone

Abstract: We consider the problem of optimal control of a Kth order Markov process so as to minimize long-term average cost, a framework with many applications in communications and beyond. Specifically, we wish to do so without knowledge of either the transition kernel or even the order K. We develop and analyze two algorithms, based on the Lempel-Ziv scheme for data compression, that maintain probability estimates along variable length contexts. We establish that eventually, with probability 1, the optimal action is taken at each context. Further, in the case of the second algorithm, we establish almost sure asymptotic optimality

0 Replies