Learning Nash in Constrained Markov Games With an α -Potential

Published: 2024, Last Modified: 14 Nov 2025IEEE Control. Syst. Lett. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: We develop a best-response algorithm for solving constrained Markov games assuming limited violations for the potential game property. The limited violations of the potential game property mean that changes in value function due to unilateral policy alterations can be measured by the potential function up to an error $\alpha $ . We show the existence of stationary $\epsilon $ -approximate constrained Nash policy whenever the set of feasible stationary policies is non-empty. Our setting has agents accessing an efficient probably approximately correct solver for a constrained Markov decision process which they use for generating best-response policies against the other agents’ former policies. For an accuracy threshold $\epsilon \gt 4\alpha $ , the best-response dynamics generate provable convergence to $\epsilon $ -Nash policy in finite time with probability at least $1-\delta $ at the expense of polynomial bounds on sample complexity that scales with the reciprocal of $\epsilon $ and $\delta $ .
Loading