Open-Ended Learning in General-Sum Games: The Role of Diversity in Correlated Equilibrium

24 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Supplementary Material: pdf
Primary Area: reinforcement learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Correlated Equilibrium, Policy Diversity, PSRO
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Abstract: The primary in this work focuses on the challenging and crucial task of identifying and selecting equilibria for $n$-player general-sum games. PSRO serves as a comprehensive framework for tackling complex games by leveraging the concept of the meta-game. However, prior research on PSRO mainly concentrates on solving two-player zero-sum games. Extended approaches such as JPRSO and $\alpha$-Rank can address multi-player general-sum games, and these methods theoretically ensure uniqueness and convergence. Nonetheless, a noticeable gap often exists between the joint policy distribution derived by the solver and the target equilibrium, which can undermine the robustness of the joint policy. Within the PSRO framework, diversity characterizes the distinctions among policies within the population, representing the exploration of the policy space by players. Consequently, allocating greater sampling probabilities (meta-strategy) to more diverse policies encourages players to employ more exploratory policies, thereby mitigating the risk of exploitation. We begin by incorporating diversity measures into solving equilibria for $n$-player meta-games and introduce a novel equilibrium concept, called Diverse (C)CE, the objective of which is to maximize sum of expectations of each player's diversity. In alignment with this, we present a policy training algorithm, Diverse Correlated Oracle (DCO), which effectively associates policy dynamics with the joint policy distribution. The experimental results conducted on a range of multi-player, general-sum games demonstrate that our algorithm outperforms JPSRO and $\alpha$-Rank and enhances the approximation of the joint policy distribution towards the target equilibrium by notably reducing the gap.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8733
Loading