Offline Equilibrium Finding

TMLR Paper1098 Authors

27 Apr 2023 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Offline reinforcement learning (offline RL) is an emerging field that has recently attracted significant interest across a wide range of application domains, owing to its ability to learn policies from previously collected datasets. The success of offline RL has paved the way for tackling previously intractable real-world problems, but so far, only in single-agent scenarios. Given its potential, our goal is to generalize this paradigm to the multiplayer-game setting. To this end, we introduce a novel problem, called \textit{offline equilibrium finding} (OEF), and construct various types of datasets spanning a wide range of games using several established methods. To solve the OEF problem, we design a model-based framework capable of directly adapting any online equilibrium finding algorithm to the OEF setting while making minimal changes. We adapt the three most prominent contemporary online equilibrium finding algorithms to the context of OEF, resulting in three model-based variants: OEF-PSRO and OEF-CFR, which generalize the widely-used algorithms PSRO and Deep CFR for computing Nash equilibria, and OEF-JPSRO, which generalizes the JPSRO for calculating (coarse) correlated equilibria. Additionally, we combine the behavior cloning policy with the model-based policy to enhance performance and provide a theoretical guarantee regarding the quality of the solution obtained. Extensive experimental results demonstrate the superiority of our approach over traditional offline RL algorithms and highlight the importance of using model-based methods for OEF problems. We hope that our work will contribute to the advancement of research in large-scale equilibrium finding.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: We reformulate the theoretical analysis description and add sample analysis. We provide several new offline and online methods to select the combination parameter.
Assigned Action Editor: ~Aleksandra_Faust1
Submission Number: 1098
Loading