Best of Both Worlds: Regret Minimization versus Minimax Play

Adrian Müller; Jon Schneider; Stratis Skoulakis; Luca Viano; Volkan Cevher

Best of Both Worlds: Regret Minimization versus Minimax Play

Adrian Müller, Jon Schneider, Stratis Skoulakis, Luca Viano, Volkan Cevher

Published: 17 Jul 2025, Last Modified: 07 Oct 2025EWRL 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: adversarial bandits, game theory, regret minimization

TL;DR: Combining regret minimization and minimax play allows to exploit weak strategies while risking only constant expected loss.

Abstract: In this paper, we investigate the existence of online learning algorithms with bandit feedback that simultaneously guarantee $O(1)$ regret compared to a given comparator strategy, and $\tilde{O}(\sqrt{T})$ regret compared to any fixed strategy, where $T$ is the number of rounds. We provide the first affirmative answer to this question whenever the comparator strategy supports every action. In the context of zero-sum games with min-max value zero, both in normal- and extensive form, we show that our results allow us to guarantee to risk at most $O(1)$ loss while being able to gain $\Omega(T)$ from exploitable opponents, thereby combining the benefits of both no-regret algorithms and minimax play.

Confirmation: I understand that authors of each paper submitted to EWRL may be asked to review 2-3 other submissions to EWRL.

Serve As Reviewer: ~Adrian_Müller2

Track: Fast Track: published work

Publication Link: admuell@ethz.ch

Submission Number: 49

Loading