TL;DR: We meta-learn regret minimizers to converge fast in self-play on a desired distribution of games.
Abstract: Regret minimization is a general approach to online optimization which plays a crucial role in many algorithms for approximating Nash equilibria in two-player zero-sum games.
The literature mainly focuses on solving individual games in isolation.
However, in practice, players often encounter a distribution of similar but distinct games.
For example, when trading correlated assets on the stock market, or when refining the strategy in subgames of a much larger game.
Recently, offline meta-learning was used to accelerate one-sided equilibrium finding on such distributions.
We build upon this, extending the framework to the more challenging \emph{self-play} setting, which is the basis for most state-of-the-art equilibrium approximation algorithms for domains at scale.
When selecting the strategy, our method uniquely integrates information across all decision states, promoting \emph{global} communication as opposed to the traditional local regret decomposition.
Empirical evaluation on normal-form games and river poker subgames shows our meta-learned algorithms considerably outperform other state-of-the-art regret minimization algorithms.
Primary Area: Theory->Online Learning and Bandits
Keywords: game theory, imperfect information, equilibrium
Submission Number: 4007
Loading