Classification Bandits: Classification Using Expected Rewards as Imperfect Discriminators

Koji Tabata, Atsuyoshi Nakamura, Tamiki Komatsuzaki

2021 (modified: 05 Nov 2022)PAKDD (Workshops) 2021Readers: Everyone

Abstract: A classification bandits problem is a new class of multi-armed bandits problems in which an agent must classify a given set of arms into positive or negative depending on whether the number of bad arms are at least $$N_2$$ or at most $$N_1(<N_2)$$ by drawing as fewer arms as possible. In our problem setting, bad arms are imperfectly characterized as the arms with above-threshold expected rewards (losses). We develop a method of reducing classification bandits to simpler one threshold classification bandits and propose an algorithm for the problem that classifies a given set of arms correctly with a specified confidence. Our numerical experiments demonstrate effectiveness of our proposed method.

0 Replies