Keywords: bandit, online learning
Abstract: We consider the Adversarial Multi-Armed Bandits (MAB) problem with unbounded losses, where the algorithms have no prior knowledge on the sizes of the losses. We present \texttt{UMAB-NN} and \texttt{UMAB-G}, two algorithms for non-negative and general unbounded loss respectively. For non-negative unbounded losses, \texttt{UMAB-NN} achieves the first fully adaptive and scale-free regret bound. Building up on that, we further develop \texttt{UMAB-G} that can learn from arbitrary unbounded losses. Complementing the algorithms is a new lower bound, demonstrating a fundamental trade-off between adaptivility and minimax optimality in scale-free MAB. Finally, we perform extensive empirical evaluations, showing that our algorithms consistently out-perform all existing algorithms that handle unbounded losses.
Submission Number: 61
Loading