A-PSRO: A Unified Strategy Learning Method with Advantage Metric for Normal-form Games

Yudong Hu; Haoran Li; Congying Han; Tiande Guo; Bonan Li; Mingqiang Li

A-PSRO: A Unified Strategy Learning Method with Advantage Metric for Normal-form Games

Yudong Hu, Haoran Li, Congying Han, Tiande Guo, Bonan Li, Mingqiang Li

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We introduce the advantage metric to improve the PSRO framework for solving normal-form games.

Abstract: Solving the Nash equilibrium in normal-form games with large-scale strategy spaces presents significant challenges. Open-ended learning frameworks, such as PSRO and its variants, have emerged as effective solutions. However, these methods often lack an efficient metric for evaluating strategy improvement, which limits their effectiveness in approximating equilibria. In this paper, we introduce a novel evaluative metric called Advantage, which possesses desirable properties inherently connected to the Nash equilibrium, ensuring that each strategy update approaches equilibrium. Building upon this, we propose the Advantage Policy Space Response Oracle (A-PSRO), an innovative unified open-ended learning framework applicable to both zero-sum and general-sum games. A-PSRO leverages the Advantage as a refined evaluation metric, leading to a consistent learning objective for agents in normal-form games. Experiments showcase that A-PSRO significantly reduces exploitability in zero-sum games and improves rewards in general-sum games, outperforming existing algorithms and validating its practical effectiveness.

Lay Summary: Game theory primarily studies strategic interactions among multiple rational agents, and it can be used to explain real-world scenarios in politics, economics, and common games such as chess and card games. Nash equilibrium represents a stable state achieved through strategic improvements by these agents and is often considered the strongest strategy in a game. Thus, solving for a Nash equilibrium is equivalent to finding the optimal solution of the game. Previous research proposed PSRO as an efficient algorithm for computing Nash equilibria, but its efficiency is affected by the randomness in strategy exploration. This paper introduces the advantage function as an evaluation metric for strategy exploration. With favorable theoretical properties, it accelerates the computation of Nash equilibria. Based on this, we propose the A-PSRO algorithm, which significantly improves equilibrium solving in games.

Primary Area: Theory->Game Theory

Keywords: PSRO, Game Theory, Nash Equilibrium

Submission Number: 4373

Loading