Abstract: Two-player single-controller zero-sum stochastic games are a class of zero-sum dynamic games with Markovian state dynamics, where only one player controls the state transitions. Design of optimal strategies for such games with large state and action spaces relies on computationally demanding dynamic programming. Linear programming can also be used, but the number of constraints equals the number of states. This paper presents a class of simple suboptimal strategies that can be constructed by playing a certain repeated static game where neither player observes the specific mixed strategies used by the other player at each round. We quantify the suboptimality of the resulting strategies and show that, when the two players honestly follow the prescribed protocol, each player can exploit the regularity or predictability of the moves of the other player, and thus speed up convergence to the minimax value.
0 Replies
Loading