Online Sign Identification: Minimization of the Number of Errors in Thresholding Bandits

Reda Ouhamma; Rémy Degenne; Pierre Gaillard; Vianney Perchet

Online Sign Identification: Minimization of the Number of Errors in Thresholding Bandits

Reda Ouhamma, Rémy Degenne, Pierre Gaillard, Vianney Perchet

Published: 09 Nov 2021, Last Modified: 20 Oct 2024NeurIPS 2021 Conference Blind SubmissionReaders: Everyone

Keywords: Multi-armed bandit, thresholding bandits

TL;DR: We generalize the thresholding bandit setting and devise a generic algorithm as well as a proof methodology that also applies to existing algorithms in literature.

Abstract: In the fixed budget thresholding bandit problem, an algorithm sequentially allocates a budgeted number of samples to different distributions. It then predicts whether the mean of each distribution is larger or lower than a given threshold. We introduce a large family of algorithms (containing most existing relevant ones), inspired by the Frank-Wolfe algorithm, and provide a thorough yet generic analysis of their performance. This allowed us to construct new explicit algorithms, for a broad class of problems, whose losses are within a small constant factor of the non-adaptive oracle ones. Quite interestingly, we observed that adaptive methods empirically greatly out-perform non-adaptive oracles, an uncommon behavior in standard online learning settings, such as regret minimization. We explain this surprising phenomenon on an insightful toy problem.

Supplementary Material: zip

Submission History: No

Checklist: Yes, we completed the NeurIPS 2021 paper checklist, and have included it in our PDF.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Code: zip

Thumbnail: No thumbnail

8 Replies

Loading