RL-ARNE: A Reinforcement Learning Algorithm for Computing Average Reward Nash Equilibrium of Nonzero-Sum Stochastic Games

Dinuka Sahabandu; Shana Moothedath; Joey Allen; Linda Bushnell; Wenke Lee; Radha Poovendran

RL-ARNE: A Reinforcement Learning Algorithm for Computing Average Reward Nash Equilibrium of Nonzero-Sum Stochastic Games

Dinuka Sahabandu, Shana Moothedath, Joey Allen, Linda Bushnell, Wenke Lee, Radha Poovendran

Published: 01 Jan 2024, Last Modified: 15 May 2025IEEE Trans. Autom. Control. 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Stochastic games model the strategic interactions between two or more players that occur in a sequence of stages. In this article, we focus on computing the average reward Nash equilibrium (ARNE) of a nonzero-sum stochastic game when the transition probabilities of the game and reward structure of the players are unknown. We note that the current state-of-the-art reinforcement learning (RL) algorithms that compute the ARNE of nonzero-sum stochastic games require solving a matrix game corresponding to each state of the game at every iteration of the algorithm, which is PPAD-complete and incurs a memory complexity that is exponential in the number of players. In this article, we use temporal difference error minimization and stochastic approximation to develop a scalable RL algorithm to compute an ARNE of nonzero-sum stochastic games. We prove the convergence of our algorithm to an ARNE. We evaluate the performance of our algorithm using an attacker–defender game modeled on a real-world ransomware dataset.

Loading