Keywords: Bandits, Online learning
Abstract: We revisit the problem of stochastic online learning with feedback
graphs, with the goal of devising algorithms that are optimal, up to
constants, both asymptotically and in finite time. We show that,
surprisingly, the notion of optimal finite-time regret is not a
uniquely defined property in this context and that, in general, it
is decoupled from the asymptotic rate. We discuss alternative
choices and propose a notion of finite-time optimality that we argue
is \emph{meaningful}. For that notion, we give an algorithm that
admits quasi-optimal regret both in finite-time and asymptotically.
Supplementary Material: pdf
12 Replies
Loading