A Finite-Sample Analysis of Payoff-Based Independent Learning in Zero-Sum Stochastic Games

Published: 21 Sept 2023, Last Modified: 02 Nov 2023NeurIPS 2023 posterEveryoneRevisionsBibTeX
Keywords: Zero-sum stochastic games, payoff-based independent learning, best-response-type dynamics, finite-sample analysis
TL;DR: We study best-response independent learning dynamics in zero-sum matrix games and stochastic games and provide finite-sample convergence guarantees.
Abstract: In this work, we study two-player zero-sum stochastic games and develop a variant of the smoothed best-response learning dynamics that combines independent learning dynamics for matrix games with the minimax value iteration for stochastic games. The resulting learning dynamics are payoff-based, convergent, rational, and symmetric between the two players. Our theoretical results present to the best of our knowledge the first last-iterate finite-sample analysis of such independent learning dynamics. To establish the results, we develop a coupled Lyapunov drift approach to capture the evolution of multiple sets of coupled and stochastic iterates, which might be of independent interest.
Submission Number: 10408
Loading