Online Double Oracle

Le Cong Dinh; Stephen Marcus McAleer; Zheng Tian; Nicolas Perez-Nieves; Oliver Slumbers; David Henry Mguni; Jun Wang; Haitham Bou Ammar; Yaodong Yang

Online Double Oracle

Le Cong Dinh, Stephen Marcus McAleer, Zheng Tian, Nicolas Perez-Nieves, Oliver Slumbers, David Henry Mguni, Jun Wang, Haitham Bou Ammar, Yaodong Yang

Published: 04 Oct 2022, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Solving strategic games with huge action spaces is a critical yet under-explored topic in economics, operations research and artificial intelligence. This paper proposes new learning algorithms for solving two-player zero-sum normal-form games where the number of pure strategies is prohibitively large. Specifically, we combine no-regret analysis from online learning with Double Oracle (DO) from game theory. Our method---\emph{Online Double Oracle (ODO)}---is provably convergent to a Nash equilibrium (NE). Most importantly, unlike normal DO, ODO is \emph{rational} in the sense that each agent in ODO can exploit a strategic adversary with a regret bound of $\mathcal{O}(\sqrt{ k \log(k)/T})$, where $k$ is not the total number of pure strategies, but rather the size of \emph{effective strategy set}. In many applications, we empirically show that $k$ is linearly dependent on the support size of the NE. On tens of different real-world matrix games, ODO outperforms DO, PSRO, and no-regret algorithms such as Multiplicative Weights Update by a significant margin, both in terms of convergence rate to a NE, and average payoff against strategic adversaries.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: We have updated the paper to a camera-ready version.

Code: https://github.com/npvoid/OnlineDoubleOracle

Assigned Action Editor: ~Michal_Valko1

License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Submission Number: 294

Loading