Best of Both Worlds: Regret Minimization versus Minimax Play

Adrian Müller; Jon Schneider; Stratis Skoulakis; Luca Viano; Volkan Cevher

Best of Both Worlds: Regret Minimization versus Minimax Play

Adrian Müller, Jon Schneider, Stratis Skoulakis, Luca Viano, Volkan Cevher

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Combining regret minimization and minimax play allows to exploit weak strategies while risking only constant expected loss.

Abstract: In this paper, we investigate the existence of online learning algorithms with bandit feedback that simultaneously guarantee $O(1)$ regret compared to a given comparator strategy, and $\tilde{O}(\sqrt{T})$ regret compared to any fixed strategy, where $T$ is the number of rounds. We provide the first affirmative answer to this question whenever the comparator strategy supports every action. In the context of zero-sum games with min-max value zero, both in normal- and extensive form, we show that our results allow us to guarantee to risk at most $O(1)$ loss while being able to gain $\Omega(T)$ from exploitable opponents, thereby combining the benefits of both no-regret algorithms and minimax play.

Lay Summary: When repeatedly playing a game such as Rock-Paper-Scissors or Poker against an unknown opponent, the following dilemma arises: Should one rather a) compute a strong strategy and play it in every round, or b) run a learning algorithm that automatically adapts to the opponent's play over time? The first approach (a) would guarantee that one can expect to lose nothing against the opponent. Yet, this static approach comes at the cost of potentially missing out on systematically winning against the opponent if they are weak. Indeed, the second approach (b) would guarantee to systematically win against such weak opponents. However, in this case we also risk losing a significant amount due to the slow learning process. In this paper, we show that, perhaps surprisingly, it is possible to essentially guarantee the benefits of both of these approaches in many games of interest, even if one does not observe all information the learning algorithm may benefit from. This implies that in such games, one can indeed hope to systematically win against weak opponents while risking only a small expected loss, even if the opponent turns out to be strong.

Primary Area: Theory->Game Theory

Keywords: online learning, game theory, regret minimization

Submission Number: 6424

Loading