Beyond Bandit Feedback in Online Multiclass Classification

Dirk van der Hoeven; Federico Fusco; Nicolò Cesa-Bianchi

Beyond Bandit Feedback in Online Multiclass Classification

Dirk van der Hoeven, Federico Fusco, Nicolò Cesa-Bianchi

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 PosterReaders: Everyone

Keywords: Online learning, multiclass classification, bandit algorithms, surrogate losses, feedback graphs

TL;DR: We present new and improved results for the online multiclass classification setting.

Abstract: We study the problem of online multiclass classification in a setting where the learner's feedback is determined by an arbitrary directed graph. While including bandit feedback as a special case, feedback graphs allow a much richer set of applications, including filtering and label efficient classification. We introduce \textproc{Gappletron}, the first online multiclass algorithm that works with arbitrary feedback graphs. For this new algorithm, we prove surrogate regret bounds that hold, both in expectation and with high probability, for a large class of surrogate losses. Our bounds are of order $B\sqrt{\rho KT}$, where $B$ is the diameter of the prediction space, $K$ is the number of classes, $T$ is the time horizon, and $\rho$ is the domination number (a graph-theoretic parameter affecting the amount of exploration). In the full information case, we show that \textproc{Gappletron} achieves a constant surrogate regret of order $B^2K$. We also prove a general lower bound of order $\max\big\{B^2K,\sqrt{T}\big\}$ showing that our upper bounds are not significantly improvable. Experiments on synthetic data show that for various feedback graphs our algorithm is competitive against known baselines.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: zip

Code: zip

11 Replies

Loading