Non-trivial two-armed partial-monitoring games are bandits

András Antos, Gábor Bartók, Csaba Szepesvári

Published: 2011, Last Modified: 17 May 2024CoRR 2011EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: We consider online learning in partial-monitoring games against an oblivious adversary. We show that when the number of actions available to the learner is two and the game is nontrivial then it is reducible to a bandit-like game and thus the minimax regret is $\Theta(\sqrt{T})$.