Bounding Regret in Empirical Games

Steven Jecmen, Arunesh Sinha, Zun Li, Long Tran-Thanh

Published: 03 Apr 2020, Last Modified: 27 Sept 2024Thirty-Fourth AAAI Conference on Artificial IntelligenceEveryoneCC BY-ND 4.0

Abstract: Empirical game-theoretic analysis refers to a set of models and techniques for solving large-scale games. However, there is a lack of a quantitative guarantee about the quality of output approximate Nash equilibria (NE). A natural quantitative guarantee for such an approximate NE is the regret in the game (i.e. the best deviation gain). We formulate this deviation gain computation as a multi-armed bandit problem, with a new optimization goal unlike those studied in prior work. We propose an efficient algorithm Super-Arm UCB (SAUCB) for the problem and a number of variants. We present sample complexity results as well as extensive experiments that show the better performance of SAUCB compared to several baselines.