For Universal Multiclass Online Learning, Bandit Feedback and Full Supervision are Equivalent

Steve Hanneke; Amirreza Shaeiri; Hongao Wang

For Universal Multiclass Online Learning, Bandit Feedback and Full Supervision are Equivalent

Steve Hanneke, Amirreza Shaeiri, Hongao Wang

Published: 18 Dec 2024, Last Modified: 14 Feb 2025ALT 2025EveryoneRevisionsBibTeXCC BY 4.0

Abstract: We study the problem of multiclass online learning under $\textit{bandit feedback}$ within the framework of $\textit{universal learning}$ [Bousquet, Hanneke, Moran, van Handel, and Yehudayoff; STOC '21]. In multiclass online learning under bandit feedback, it is well known that no concept class $\mathcal{C}$ is $\textit{uniformly}$ learnable when the effective label space is unbounded, or in other words, no online learner guarantees a finite bound on the expected number of mistakes holding uniformly over all realizable data sequences. In contrast, surprisingly, we show that in the case of $\textit{universal}$ learnability of concept classes $\mathcal{C}$, there is an exact equivalence between multiclass online learnability under bandit feedback and full supervision, in both the realizable and agnostic settings. More specifically, our first main contribution is a theory that establishes an inherent dichotomy in multiclass online learning under bandit feedback within the realizable setting. In particular, for any concept class $\mathcal{C}$ even when the effective label space is unbounded, we have: (1) If $\mathcal{C}$ does not have an infinite multiclass Littlestone tree, then there is a deterministic online learner that makes only finitely many mistakes against any realizable adversary, crucially without placing a uniform bound on the number of mistakes. (2) If $\mathcal{C}$ has an infinite multiclass Littlestone tree, then there is a strategy for the realizable adversary that forces any learner, including randomized, to make linear expected number of mistakes. Furthermore, our second main contribution reveals a similar trend in the agnostic setting.

PDF: pdf

Submission Number: 124

Loading